MRG32k3a Random Number Generator on WebGL2
Introduction
For Monte Carlo simulations, typically it is not necessary to exchange data between each execution, so parallelizing the calculations can lead to efficient simulations, especially when using a GPU. To implement Monte Carlo simulations as a web application using GPU, a possible approach is to use WebGL2. However, since there is no built-in random number generator on OpenGL ES 3.0, we need to implement our own random number generator.
Basically, random number generation (RNG) algorithms sequentially generate values starting from an initial seed, so some trick is needed to parallelize this process. The simplest approach is that each computation thread first skips ahead to different points in the sequence, and then generates values one by one from there (Figure 1). To achieve this, we need an efficient skip algorithm as well as an algorithm for sequentially generating values.
Figure 1: Idea of Parallel Random Number Generation.
In this article, we introduce one of the common RNG algorithms for parallel computation: MRG32k3a. Based on the papers by L’Ecuyer [2] and [3], we will review the outline of the algorithm and how to implement it on WebGL2.
MRG32k3a
Multiple Recursive Generator (MRG) is a type of RNG algorithm that updates the state by calculating a linear combination of past states, as shown in Eq. (1).
The idea of MRG32k3a is to use multiple MRGs to achieve a longer period and better random characteristics. As shown in Eq. (2), MRG32k3a generates random numbers sequentially.
where
Implementation of this algorithm is trickier than it looks. On OpenGL ES 3.0, we can only use 32-bit integers, and since we need to use the full 32 bits, multiplication (also addition) can easily lead to overflow. Therefore, we need to be careful about overflow when implementing multiplication, addition, and modulo operations correctly. To calculate , we can repeatedly add to itself. But to reduce the number of calculations, we can use a method similar to exponentiation by squaring, checking the modulo at each addition.
uint addModM(uint a, uint b, uint m) {
uint amodm = a % m;
uint bmodm = b % m;
uint blim = m - amodm;
if (bmodm <= blim) {
return amodm + bmodm;
}
else {
return amodm - (m - bmodm);
}
}
uint diffModM(uint a, uint b, uint m) {
uint amodm = a % m;
uint bmodm = b % m;
if (amodm >= bmodm) {
return amodm - bmodm;
}
else {
return amodm + (m - bmodm);
}
}
uint multModM(uint a, uint b, uint m) {
uint amodm = a % m;
uint bmodm = b % m;
uint res = 0u;
while (bmodm > 0u) {
if ((bmodm&0x1u) == 0x1u) {
res = addModM(res, amodm, m);
}
bmodm = bmodm >> 1;
amodm = addModM(amodm, amodm, m);
}
return res;
}
Using these functions, we can implement MRG32k3a as shown below.
// MRG32k3a parameters
uint[3] x1 = uint[3](0u, 0u, 1234567u);
uint[3] x2 = uint[3](0u, 0u, 1234567u);
const uint m1 = 4294967087u;
const uint m2 = 4294944443u;
const uint a11 = 1403580u;
const uint a10 = 810728u;
const uint a22 = 527612u;
const uint a20 = 1370589u;
uint stepMRG32k3a(void) {
uint x1i = diffModM(multModM(x1[1], a11, m1), multModM(x1[0], a10, m1), m1);
uint x2i = diffModM(multModM(x2[2], a22, m2), multModM(x2[0], a20, m2), m2);
x1[0] = x1[1];
x1[1] = x1[2];
x1[2] = x1i;
x2[0] = x2[1];
x2[1] = x2[2];
x2[2] = x2i;
return diffModM(x1i, x2i, m1);
}
Skip Algorithm of MRG32k3a
Meanwhile, by describing MRG32k3a in matrix form, the one-step forward can be expressed as shown in Eqs. (5) and (6).
The one-step backward can also be expressed in a similar way.
As we described one-step forward and backward in matrix form, we can skip multiple steps forward or backward by calculating the power of the matrices. Using the exponentiation by squaring, we can efficiently compute the power of the matrix.
As there is no 2D array in OpenGL ES 3.0 and mat3
is only available in float
type, we will use a 1D array of uint
to represent the matrix calculation.
// calculate c = a*b mod m
void matMultModM(uint[9] a, uint[9] b, uint m, out uint[9] c) {
c[0] = addModM(addModM(multModM(a[0],b[0],m), multModM(a[1],b[3],m), m), multModM(a[2],b[6],m), m);
c[1] = addModM(addModM(multModM(a[0],b[1],m), multModM(a[1],b[4],m), m), multModM(a[2],b[7],m), m);
c[2] = addModM(addModM(multModM(a[0],b[2],m), multModM(a[1],b[5],m), m), multModM(a[2],b[8],m), m);
c[3] = addModM(addModM(multModM(a[3],b[0],m), multModM(a[4],b[3],m), m), multModM(a[5],b[6],m), m);
c[4] = addModM(addModM(multModM(a[3],b[1],m), multModM(a[4],b[4],m), m), multModM(a[5],b[7],m), m);
c[5] = addModM(addModM(multModM(a[3],b[2],m), multModM(a[4],b[5],m), m), multModM(a[5],b[8],m), m);
c[6] = addModM(addModM(multModM(a[6],b[0],m), multModM(a[7],b[3],m), m), multModM(a[8],b[6],m), m);
c[7] = addModM(addModM(multModM(a[6],b[1],m), multModM(a[7],b[4],m), m), multModM(a[8],b[7],m), m);
c[8] = addModM(addModM(multModM(a[6],b[2],m), multModM(a[7],b[5],m), m), multModM(a[8],b[8],m), m);
}
Using this function, we implement the matrix exponentiation by squaring.
// calculate b = a**n mod m
void matPowModM(uint[9] a, uint n, uint m, out uint[9] b) {
uint[9] apow = a;
b = uint[9](1u, 0u, 0u, 0u, 1u, 0u, 0u, 0u, 1u);
while (n > 0u) {
if ((n&0x1u) == 0x1u) {
matMultModM(apow, b, m, b);
}
n = n >> 1;
matMultModM(apow, apow, m, apow);
}
}
Finally, the skip algorithm can be implemented as follows.
const uint[9] a1 = uint[9](0u, 1u, 0u, 0u, 0u, 1u, 4294156359u, 1403580u, 0u);
const uint[9] a2 = uint[9](0u, 1u, 0u, 0u, 0u, 1u, 4293573854u, 0u, 527612u);
uint skipMRG32k3a(uint n) {
uint[9] a1pow;
uint[9] a2pow;
matPowModM(a1, n, m1, a1pow);
matPowModM(a2, n, m2, a2pow);
matVecMultModM(a1pow, x1, m1, x1);
matVecMultModM(a2pow, x2, m2, x2);
return diffModM(x1[0], x2[0], m1);
}
References
- T. Bradley, J. du Toit, R. Tong, M. Giles, P. Woodhams, Parallelization techniques for random numbers generators, in: GPU Computing Gems, Gems Emerald ed., 2011, pp. 231–246, 10.1016/B978-0-12-384988-5.00016-4.
- Pierre L’Ecuyer, (1999) Good Parameters and Implementations for Combined Multiple Recursive Random Number Generators. Operations Research 47(1):159-164, 10.1287/opre.47.1.159.
- Pierre L’Ecuyer, Richard Simard, E. Jack Chen, W. David Kelton, (2002) An Object-Oriented Random-Number Package with Many Long Streams and Substreams. Operations Research 50(6):1073-1075, 10.1287/opre.50.6.1073.358.
Related Articles
Time- and Temperature-Dependent Heat Load for Ansys Transient Thermal
In this article, we will explore the implementation of time- and temperature-dependent heat loads (surface heat flux and volumetric heat generation) in Ansys Transient Thermal simulations using APDL scripts.
Exporting Ansys Workbench Solutions using APDL Commands
When using Ansys Workbench, users may need to export their solutions in a specific format for post-processing or further analysis. This article provides some examples of how to generate customized output for transient thermal analysis using APDL commands.
Xorshift Random Number Generator
Xorshift is one of the simplest and high-speed random number generator (RNG) algorithms. In this article, we explore the concept of Xorshift, and implement Rust code for searching valid Xorshift parameters for 32-bit binary.