This page demonstrates the pitfalls of using a pseudorandom number generator with a small native range, namely the standard rand()
function of Microsoft Visual C++, to generate output values within arbitrary target ranges.
 Overview of range projection mechanisms
 Test Programs with documentation and download
 Sample Test Results measured on my systems
 Test Conclusions drawn from these results
Addendum 20131217: Stephan T. Lavavej’s presentation rand() Considered Harmful discusses the range projection trap of Visual C++ rand()
, along with many other issues, and demonstrates the new C++ 11 facilities which fix them (available in VC++ 2013). Eric Lippert’s analysis How much bias is introduced by the remainder technique? nicely visualizes the distortion.
Overview
Random number generators typically generate values within a certain integer range, defined as [0 … RAND_MAX
] in standard C libraries. To use these values in an application, it is usually necessary to project them to an arbitrary target range [0 … n1], or possibly a shifted range [m … m+n1] which is equivalent to the unshifted range for the purpose of this discussion. The two most commonly used projection methods are modulo (actually remainder but that’s not relevant here) and division:
r_mod = rand() % n; r_div = (unsigned) ((double) rand() / (RAND_MAX+1) * n);
Many authors recommend against using the modulo method as the random values generated by some algorithms contain loworder bits (which the modulus operator isolates) that are not particularly random at all. These authors then usually proceed to recommend the division method.
However, both methods share the same and much worse flaws, which become apparent and relevant to practical applications if the random number generator has a small “native range”, i.e. a small RAND_MAX
:

If n >
RAND_MAX
+1, some values in the target range [0 … n1] cannot be generated by either the modulo or the division projection. 
If n is not an even multiple of
RAND_MAX
+1 or vice versa, some values in the target range will be generated much more frequently than others (twice as often or more).
This is due to the fact that values in the target range are mapped to a varying number of values in the native range. Each target value receives a probability of x/(RAND_MAX
+1) where x may be different for different target values and may even be zero in the first case, rather than its expected probability of 1/n for an ideal distribution.
Although this fact must seem obvious to professional mathematicians, I confess that I only learned of it from a brief remark in the excellent Accelerated C++ by Andrew Koenig and Barbara E. Moo (AddisonWesley 2000, p.135, section 7.4.4). Many programmers seem to be unaware of this issue, likely because most libraries provide generators with a large RAND_MAX
(typically 2^{32}1). This does not make the flaws of the modulo and division projection disappear but it does render them sufficiently small as to be irrelevant in practice.
Bucketing Algorithm
The solution – if you are either stuck using a generator such as Visual C++ rand()
with a small RAND_MAX
or if you need a perfect distribution – is to use a bucketing algorithm for projection, as suggested by Koenig/Moo. The one provided in our test program is somewhat more complex as it handles ranges greater than RAND_MAX
via recursive calls. Please see function RandBucket
in file RandTest.c
for this algorithm.
Bucketing retains the exact distribution quality produced by the underlying random number generator for all target ranges, as opposed to modulo/division projection which can produce a significantly worse distribution. Bucketing’s main drawback is its slower speed because (1) random values may have to be regenerated until one hits a bucket, and (2) target ranges greater than RAND_MAX
require recursive calls.
On the rest of this page we’ll examine just how badly modulo and division projection can damage the output distribution of a generator with a small native range, and how much of a performance cost we have to pay for bucketing to avoid this problem.
Test Programs
All results shown below were obtained with two small test programs. The download package RandTest.zip (65.6 KB, ZIP archive) comprises the precompiled executables and their complete source code. Please refer to the enclosed ReadMe.txt
file and the compilation batch file for the required development tools and expected file paths.
The principal test program is RandTest.c
, written to test the rand()
library function of Visual C++, a pseudorandom number generator with a small RAND_MAX
of only 2^{15}1. We also perform control tests with other generators. The result table identifies the various test combinations as follows:
 Modulo — Visual C++
rand()
with modulo projection  Division — Visual C++
rand()
with division projection  Bucketing — Visual C++
rand()
with bucketing projection  Secure — Visual C++
rand_s()
with modulo projection  Twister — Visual C++ Mersenne Twister with modulo projection
 C# Random — Visual C#
System.Random
with modulo projection  C# Twister — Visual C# Mersenne Twister with modulo projection
The “secure” Windows generator rand_s()
is an external library function available on Windows XP and later. The System.Random
class is the standard generator of the .NET Framework. Finally, I added C and C# versions of the famous Mersenne Twister algorithm (suggested by Nathan Mates). The native range of rand_s()
and Mersenne Twister is [0 … 2^{32}1] while that of System.Random
is [0 … 2^{31}1].
Invocation and Output
The C program RandTest
and the C# program CSharpTest
are each called with two parameters:
 The upper bound n of the target range, indicating a range of [0 … n1]
 The iteration count, indicating the number of random values to be generated
The program counts how many times each value of the target range is “hit” by a random number – ideally, each should receive the same number of hits – and then computes the standard deviation for this series. Additionally, the time spent in the random number generation loop is displayed.
Standard deviation is by no means an exhaustive test of the quality of a random number generator, and I’m no expert in this area anyway. However, as we shall see the distribution produced by a generator with a small native range is so massively flawed for certain target ranges that the standard deviation is sufficient to make our point.
The difference in standard deviation becomes more pronounced as the ratio of iteration count to target range increases. The test is useless if the iteration count is no greater than the target range, as many target values would be missed even if the range projection method would not introduce any distribution errors.
Additionally, the distribution distortion of the modulo and division projections only appear for target ranges approaching or exceeding the generator’s native range. So in order to discover this distortion with a native range of [0 … 2^{31}1] or greater, at least one billion hit counters must be kept in memory and at least ten billion iterations performed. I did not attempt to conduct such an experiment.
Sample Test Results
The following tables show two sets of sample test results on my systems. All tests were conducted with 100 million iterations for the indicated target range n. Each test result shows the standard deviation from a perfect hit distribution, followed by the execution time in seconds. The programs were compiled with Visual C++ and Visual C# from Microsoft Visual Studio. Both programs run in 32bit mode and use full optimization (/Ox, /o) with unchecked arithmetic and no debug information.
The first set of results was obtained on 20 May 2012 (original publication date). The system was Windows 7 SP1 (64 bit) with Visual Studio 2010 Professional SP1 and .NET Framework 4.0, running on an Intel DX58SO motherboard with an Intel Core i7 920 CPU (2.67 GHz) and 6 GB RAM (DDR31333).
n  Modulo  Division  Bucketing  Secure  Twister  C# Random  C# Twister  

100  1,625  2.55  1,727  2.87  877  2.77  938  187.67  946  1.00  1014  1.76  1018  2.25  
10,000  1,368  2.52  1,369  2.83  98  3.20  99  187.84  100  1.07  101  1.83  100  2.07  
16,384  74  2.53  77  2.84  76  2.75  78  187.93  78  1.09  79  1.74  78  2.08  
20,000  1,468  2.52  1,468  2.85  68  5.61  70  188.49  70  1.09  71  1.80  70  2.26  
32,768  54  2.54  54  2.87  54  2.75  55  188.09  55  1.08  55  1.79  56  2.11  
50,000  1,451  2.49  1,451  2.74  42  6.83  45  190.25  45  1.08  45  1.86  45  2.11  
65,536  1,526  2.47  1,526  2.75  37  5.05  39  190.45  39  1.09  39  1.87  39  2.11 
The second set of results was obtained on 17 September 2015. The system was Windows 10 Pro (64 bit) with Visual Studio 2015 Community and .NET Framework 4.6, running on a Dell XPS 15 notebook (model 9530) with an Intel Core i7 4712HQ CPU (2.30 GHz) and 16 GB RAM (DDR31600).
n  Modulo  Division  Bucketing  Secure  Twister  C# Random  C# Twister  

100  1,752  4.57  1,815  5.66  1,114  4.90  991  43.43  998  1.59  959  2.82  962  3.10  
10,000  1,369  4.99  1,369  5.96  98  5.80  101  44.69  100  1.66  100  2.84  101  3.15  
16,384  75  4.97  76  6.47  76  5.24  78  44.61  78  1.67  77  2.85  78  3.15  
20,000  1,468  5.05  1,468  6.03  68  9.22  71  44.72  71  1.68  71  2.86  70  3.18  
32,768  54  5.04  54  5.86  54  5.27  55  45.06  55  1.70  55  2.88  55  3.26  
50,000  1,451  5.10  1,451  6.41  42  12.84  45  45.22  45  1.82  45  2.96  45  3.41  
65,536  1,526  5.10  1,526  6.35  37  9.78  39  45.42  39  1.88  39  3.03  39  3.54 
The relative timings were quite similar across systems, so we can summarize both tables together.
Distribution
Modulo and Division both show the same unacceptable degradation in distribution quality for target ranges that are either larger than Microsoft’s RAND_MAX
=32,767 or not a factor of RAND_MAX
+1. Only the results for n=16,384 and n=32,768 are comparable in quality to Bucketing and to generators with a larger native range.
However, this effect is not as pronounced for target ranges far smaller than RAND_MAX
. Test case n=100 shows a deterioration of only ca. 160200% compared to at least an order of magnitude for larger target ranges. This is also presumably the reason why the distribution of the other tested generators was not significantly degraded by the simple modulo projection.
Performance
As if the small native range wasn’t bad enough, Microsoft’s rand()
is also strangely slow. The C version of the superior Mersenne Twister runs over twice as fast, and even both tested C# algorithms are significantly faster! Using Bucketing to correct the target range distribution can further slow down rand()
by up to a factor of two whenever such a correction is actually necessary.
The Secure algorithm is not just slow, it’s incredibly slow. Depending on the software environment, 1–2 orders of magnitude separate the execution time of Microsoft’s rand_s()
function from the rest of the field. Perhaps the quality of the generated random number sequence is worth the wait in a securitycritical context, but you certainly wouldn’t want to call this function for any other purpose.
Test Conclusions
Never use the rand()
function of Microsoft Visual C++. This generator is slow to begin with, and its small RAND_MAX
necessitates the use of bucketing for many output ranges, making it even slower. Generally, avoid pseudorandom number generators with small native ranges and prefer ones with larger ranges, e.g. Mersenne Twister.
Use bucketing projection with any generator for target ranges near your generator’s native range where distribution quality could become a problem, or for any target range if you need a perfect distribution quality. Bucketing guarantees as good a distribution as your generator can offer, but at the price of a significantly reduced speed for certain ranges.
Use a generator with a large native range if you are confident that your target ranges will always be much smaller than the generator’s native range. Such a generator will not require bucketing to produce a good distribution in practical applications, and depending on the distribution quality of its least significant bits it can even work well with the fast & simple modulo projection.