A question that often arises is "How do I generate a random number?". The purpose of this article is to answer that question and look into some of the background behind random numbers.

Simply put, you can get a pseudo-random number by using the Rnd() function. This function returns a double precision number between 0 and 1, inclusive of 0 but exclusive of 1. This means that you could theoretically get a 0 from this function, but you cannot get a 1. You can get close though...like maybe 0.9999999999.

You can use this function to generate a pseudo-random number in any range. For instance, suppose you wish to get a random number between 1 and 100, inclusive. You could write:

i = INT(RND() * 100) + 1

To explain:
a) the RND() function returns a number between 0 and 1 (exclusive of 1)
b) we multiply by 100 to get a number between 0 and 100 (exclusive of 100)
c) we use the INT function to truncate the decimals and get just the integer portion. This gives us a number between 0 and 99 inclusive
d) We add 1 to get a number between 1 and 100

Similarily, if we wanted a number between 5 and 57, we could write:

i = INT(RND() * 53) + 5

A common error is to write the function like:

i = CINT(RND() * 99) + 1

The difference here is that the CINT function rounds fractions above 0.5 up instead of dropping them totally. So we still get a number between 1 and 100, but the probabilities are not equal. You are only half as likely to get a 1 or a 100 as any other number.

Let's look at a simpler example. A function to get a number between 1 and 4.

BAD WAY:
i = CInt( Rnd() * 3) + 1

In this example, rnd() can be a number between 0 and 1 so multiplying by 3 gives us a number between 0 and 3. We then use CINT to round the number to a whole number between 0 and 3, then add 1. So what's the problem? Well, the problem is that the odds are not equal.

Code:

From To Integer Range (Difference between From and To)
0 0.5 0 0.5
0.5 1.5 1 1
1.5 2.5 2 1
2.5 3 3 0.5

From this table, you can see that the chance of getting a 1 or 2 is twice as great as getting a 0 or 3.

GOODWAY:
i = INT(Rnd() * 4) + 1

Code:

From To Integer Range (Difference between From and To)
0 0.999... 0 1
1 1.999... 1 1
2 2.999... 2 1
3 3.999... 3 1

Technically, the range is actually 0.999999999 etc. but essentially, it is 1.

The main point is that the range is the same for all possible outcomes.

Why do I call them Pseudo random numbers instead just plain random numbers anyway?

The reason is that the rnd() function doesn't actually generate a "true" random number. Instead, it uses an algorithm that mixes numbers up and generates a number that appears to be "random", but in actuallity, it is one of a fixed sequence of numbers.

Typically, the function starts with a "seed" number. It uses this seed to calculate a pseudorandom number . This number is then used as the next seed. The initial seed value is the SAME everytime you start your application. So if you print out 5 random numbers, then next time you run your app, you will find that the numbers you get are the same as before.

To prevent this, you explicitly load a seed value with the RANDOMIZE statement. This command inputs a specified seed value, or if no value is given, it uses the current time as a seed value. This way, you get different results everytime you run the program.

You could put a RANDOMIZE statement before every call to Rnd if you want, but it's not necessary and may be counter productive. Remember, if you don't specify a seed value with Randomize, Rnd will use the previous number as a seed. So you have a pseudo random number as a seed to get the next random number. Whereas if you use Randomize all the time, you are using the current time as a seed. A value that is not nearly so random and is, in fact linear in growth and has a small amount of variability.

So how does the RND() function generate a pseudo random number then?

There are a number of methods, but the most common is the Linear Congruential Generator. Basically, it involves multiplying your seed by a constant, adding another constant, and using the MOD operator with yet another constant.

RandomNumber = (SeedValue * FactorConstant + OffsetConstant) MOD ModConstant.

To get a random number between 0 and 1, the number is then divided by ModConstant.

The actual values are fairly important here...

First, to get a reasonable amount of resolution, ModConstant should be quite large. If, for instance, you had a ModConstant value of 4 then your random number generator would only generate 4 possible values. eg. 0, 0.25, 0.5 and 0.75

Second, your FactorConstant should be a fairly large number. When you multiply it by seed value, you should get a number several times larger than modconstant.

Third, Offset constant should be less than ModConstant. Mainly because the mod operation makes larger values pointless. The main function of Offset is to prevent certain errors. For instance, suppose you had

RandomNumber = SeedValue * factorConstand MOD ModConstant.

If SeedValue happened to be 0, you would be locked into 0 forever.

Fourth, your function should not get locked into a cycle, as above. You have to pick your values carefully but a good rule of thumb is to use Prime Numbers.

That concludes this little article. I hope it helps...

"I have a plan so cunning you could put a tail on it and call it a weasel!" - Edmund Blackadder

__________________
"I have a plan so cunning you could put a tail on it and call it a weasel!" - Edmund Blackadder

Last edited by loquin; 10-19-2005 at 05:06 PM.
Reason: changed "it" to "Rnd" to clarify that Rnd uses prior random number as the seed.

The there are 2 basic methods for creating non-recurring random numbers.

1) Generate a random number. Check it against a list of numbers already generated. If it exists, get another one. If it does not, add it to the list.

2) Generate a list of all possible numbers,in sequence. Use the random number generator to mix them up.

Both methods have their advantages and disadvantages and the method you use depends upon your situation. If you are shuffling a deck of cards, you would use the second method. Otherwise, as the number of available cards diminishes, you spend most of your time generating invalid entries. Conversely, if you were generating positions of objects in a large universe, you would use the first method since you don't arn't going to generate all possible values, just a few.

Here is some code to demonstrate the two methods:

Code:

Private Function CreateRandom(ByVal num As Long, ByVal min As Long, ByVal max As Long) As Variant
'returns an array of numbers between where min <= x < max
'returns NULL if there is a problem
Dim aValues() As Long
Dim i As Long, x As Long, span As Long, j As Long, bFound As Boolean
span = max - min + 1
If (span > num) And (num > 0) Then
ReDim aValues(num - 1)
For i = 0 To num - 1
Do
x = Int(Rnd() * span) + min
bFound = False
For j = 0 To i - 1
If x = aValues(j) Then bFound = True: Exit For
Next j
Loop While bFound
aValues(i) = x
Next i
CreateRandom = aValues
Else
CreateRandom = Null
End If
End Function
Private Function ShuffleArray(aData As Variant) As Boolean
'shuffles array adata. Returns true unless there is a problem
Dim max As Long
Dim i As Long, j As Long, v As Variant
If IsArray(aData) Then
max = UBound(aData)
For i = max To 1 Step -1
j = Int(Rnd() * (i + 1)) 'pick one to switch
If i <> j Then
v = aData(i)
aData(i) = aData(j)
aData(j) = v
End If
Next i
ShuffleArray = True
End If
End Function

__________________
"I have a plan so cunning you could put a tail on it and call it a weasel!" - Edmund Blackadder

Sometimes, you want to repeat a string of numbers. This can be useful for debugging code (when you want to repeat an error so you can deal with it) or for encryption (so you can decode a message using the same set of pseudorandom numbers). There are 2 basic ways to do this.

a) If you *don't* use randomize, then by default rnd() will return the same sequence of numbers everytime you RUN the program. However, you need to exit the program and rerun it to "reset" the generator.

b) You can manually reset the generator by using Randomize and passing it a seed value. You then need to call rnd() an pass it a NEGATIVE argument (eg. -1). All subsequent calls to rnd() will then follow the same pattern.

Code:

Private Sub Command1_Click()
Dim i as integer
Randomize 1000 'any seed will do....I'm using 1000 here...
Rnd -1 'any negative number will do
For i = 1 to 100
Debug.Print rnd()
Next i
End Sub

Sometimes, it is useful to know exactly how RND() works. I explained above that it is a linear congruent generator, but I didn't show the actual coefficients, mainly because I didn't know what they are. Now I know and I've reproduced the info here.

From MSDN:

Quote:

MORE INFORMATION
Microsoft Visual Basic uses the linear-congruential method for pseudo-random number generation in the RND function. The following pseudo code documents the algorithm used:
x1 = ( x0 * a + c ) MOD (2^24)

where:

x1 = new value
x0 = previous value (an initial value of 327680 is used by Visual Basic)
a = 1140671485
c = 12820163

The 'MOD' operator in the formula above returns the integer remainder after an integer division.

The expression x1/(2^24) will then return the floating-point number between 0.0 and 1.0 that is returned by the RND function.

Note that this shows that rnd() is precise to 1 part in 2^24, or 16777216. This means that you cannot use it to generate random numbers for ranges greater than this and expect it to return ANY number within that range.

__________________
"I have a plan so cunning you could put a tail on it and call it a weasel!" - Edmund Blackadder

One way to show that the random number generator is reasonably random is to generate a histogram of the results.

You would first create an array of 1000 longs.

Then, generate an random integer between 0 and 999, inclusive. Increment the value in the array element that corresponds to the random number generated.

Repeat the random number generation & array increment for a significant number of samples - say, a million or more. Then, display the results

The Pic below is an example of the random number generator in action - a 1000 element histogram of 10 million samples. On average, for these quantities, there should be a count of 10000 in each histogram 'slot' for a random sampling.

See the attached zip file for an example of this that I threw together.

.

__________________ Lou
"I have my standards. They may be low, but I have them!" ~ Bette Middler
"It's a book about a Spanish guy called Manual. You should read it." ~ Dilbert
"To understand recursion, you must first understand recursion." ~ unknown

You could put a RANDOMIZE statement before every call to Rnd if you want, but it's not necessary and may be counter productive. Remember, if you don't specify a seed value with Randomize, Rnd will use the previous number as a seed. So you have a pseudo random number as a seed to get the next random number. Whereas if you use Randomize all the time, you are using the current time as a seed. A value that is not nearly so random and is, in fact linear in growth and has a small amount of variability.

There have been many posts on the forum regarding the use of multiple RANDOMIZE statements. A recent one got me interested in the effect of using RANDOMIZE repetitively, so I decided to take a look at the results of doing so.

I modified the double random function (& histogram) app I posted in the code library a few months ago by adding a checkbox, and inserting code in the dblRnd function to call RANDOMIZE if the box is checked.

Below is a screen shot of two typical histograms resulting from this; the first one is the 'normal' or unmodified code, where randomize is only called once, and the second shows the effect of calling randomize with every call of rnd.

As can easily be seen, calling randomize clearly adds a non-random element into the random number generation. A perfectly random number generation algorighm should result in a flat line; the closer to a flat line, the more random the number generation process. In the case of repititive randomize calls, the "noise" levels are much higher, indicating that these histogram boxes had many more random numbers than their neighbors. In addition, you see a sinsoidal cyclical effect in the 'random' number generation.

Conclusion: As BillSoo opined, calling RANDOMIZE repetitively absolutely does result in derandomizing the distribution of numbers in the VB random number generation process.
.

__________________ Lou
"I have my standards. They may be low, but I have them!" ~ Bette Middler
"It's a book about a Spanish guy called Manual. You should read it." ~ Dilbert
"To understand recursion, you must first understand recursion." ~ unknown

This "lack of randomness" in the Randomize command occurs because in a standard random number generator there are around 2.15 bil. possible numbers. Technically, for the most part, they are "Linear Congruent Generators" (LCG's) creating a Long integer with about 2.15 bil. possible results.

The Randomize command uses the number of seconds or miliseconds, on the system clock. It almost certainly makes use of the GetTickCount API within "kernel32.dll". This returns the number of 1,000ths of a second since boot up. This sounds pretty good right?

Well, it is, it's real good. But it's NOT as good as having a random number in the range of 1 to 2.15 bil.!

Currently running on my machine, my GetTickCount reads "11,268,175". Now 11.3 million is not bad, but it's a FAR cry from 2.15 billion. If one considers that most computers are in use from 9am to 5pm (or there abouts) you realize that the range of possible inputs is kinda narrow...

On the other hand, LCG's are designed to be VERY sensitive to their inputs. So "seeding" an LCG with 11,268,175 should have a VERY different result than 11,268,176. A 1,000th of a sec. difference can mean a LOT!

Overall, I'm kinda surprised that you guys found a noticeable answer, but the reality is that if using the System Clock alone were good enough as a random number generator, they wouldn't bother creating the LCG!

Loquin reminded me that BillSoo’s posting above shows that VB's random number generator only uses 24 bits and so only has about 16.8 million possible values or "states". VB is using a smaller Linear Congruent Generator ("LCG") than I had realized. When compared to the GetTickCount values I mentioned before, of around 11 million or so, these are really getting very close.

The difference is in the range of possible inputs or Seeds. With the random generator, each previous number is the Seed for the next. The more random the Seed, the more random the result. And with the Seed constantly fed from itself, the results continue to have a nice random pattern.

But if Randomize is called repeatedly, then you are using somewhat sequential seeds like 11,268,175 then maybe 11,268,178 (depending on how much time has passed between calls). This is not the full range of possible inputs by a long shot. Heck, with PC's today at 2,000+ mhz, it's extremely possible to repeatedly seed the same value over and over within the same milisecond! Clearly you don't want to call Randomize every time!

If interested in reading about the theoretics of LCG's, one could read here http://csep1.phy.ornl.gov/rn/rn.html for a pretty good primer. It's very readable. They also get into Seeding/Randomizing routines that use the System Clock & Date in addition to the tick count. (The more variation the better...)

-- Mike

Last edited by Mike Rosenblum; 08-16-2003 at 07:48 PM.

There are 16777216 possible values for the random number generator. Ideally, you should go through all of these values before you start repeating numbers. Rowlek was good enough to run a test of the rnd() function and has found that, indeed, you do go through all 16777216 values before you repeat yourself.

__________________
"I have a plan so cunning you could put a tail on it and call it a weasel!" - Edmund Blackadder

One of the questions we routinely field here is “How do I calculate random numbers that are normally (Gaussian) weighted? “

First – what is meant by normal weighting? What this means is that the probability of the random numbers produced are weighted, such that the probability curve follows a bivariate normal distribution, aka, a Gaussian distribution, aka, the infamous Bell Curve. In other words, you would have a higher probability of getting a number close to the mean than you would further from the mean.

Why would you want this? Well, Gaussian distribution is often useful in modeling real-world activities. For instance, if you were to take a few thousand measurements of a standard using almost any instrument, and record the results, the results data would typically follow a normal distribution. You would have a mean, and most of the data points would fall near the mean, closely grouped. The further from the mean though, the fewer the number of measurements at that point. The more accurate the instrument, the narrower this bell curve would be. Another example, in game theory, is when you fire a projectile - on average, the projectiles fall on the target. However, any given projectile may miss the average, or mean, by some distance. Again, most would be closely clustered around the bullseye, with some further and further away.

The results of VB’s random number generator, on the other hand, follow what is known as a continuous uniform distribution. The probability of getting a value from the rnd function that is less than zero is zero. Likewise, the probability of getting a result from rnd that is greater than 1 is also zero. Between the upper and lower limits, the probability of getting any one value is the same as getting a different value.

Now, it IS possible to transform a continuous distribution set of data into a normal distribution set. Wolfram Research (the authors of Mathematica) outline one of the common methods used – the Box-Muller Transformation.

The actual algorithms, as transposed into VB code, are discussed here and here. The algorithms at the two sites are identical, however, the actual code implementation is somewhat different. (as would be expected.)

I loaded the code from the Ohio State faculty site, (the first link immediately above) and made some modifications for more efficient (and more random) operation; it is attached below. The remainder of this discussion pertains to the altered function.

The code posted in the GetGausse sub will work to return a random number that is weighted per gausian (normal) distribution. Like any VB random number calc, you should only issue the randomize statement once in the life of your app - normally in the form load event. Otherwise, the results may be not as random as you would think (Ref the second histogram, below, and Post #5 in this thread.)

This sub assumes unity sigma, and is centered around zero. Virtually all the values returned by GetGausse will fall within +/- 5 sigma (-5 to +5) although occasionally, fliers will fall outside of this range. To scale it to your purposes, multiply the result by your desired sigma, and add the desired mean.

The efficiency improvements I mentioned above include that the function only needs to actually calculate the gaussian "random" numbers on every other pass. – since it is capable of calculating two, separate Gaussian random numbers on each pass, the function stores one of them internally, in a static variable, for the next time it is called. In addition, I added a scaling function (Normal) which calls the GetGausse function, in which you supply the desired sigma and mean values (they default to 1 and 0, respectively.)

Code:

Private Function Normal(Optional Sigma As Double = 1, Optional Mean As Double = 0) As Double
Normal = GetGausse * Sigma + Mean
End Function
Private Function GetGausse() As Double
' This Function returns a standard Gaussian random number
' based upon the polar form of the Box-Muller transform.
' since this calc is capable of returning two calculations per
' call, it's been set up to save the second calc for the next
' pass through the function, saving some time.
' Call the randomize function once (and ONLY once) in the life of the project.
Static blReturn2 As Boolean ' Flag to calc new values, or return
' previously calculated value. It defaults
' to False on the first pass.
Static dblReturn2 As Double ' Second return value
Dim Work1 As Double, Work2 As Double, Work3 As Double
Const Two = 2#, One = 1#
If blReturn2 Then ' On odd numbered calls
GetGausse = dblReturn2
Else
Work3 = Two
Do Until Work3 < One
Work1 = Two * Rnd - One
Work2 = Two * Rnd - One
Work3 = Work1 * Work1 + Work2 * Work2
Loop
Work3 = Sqr((-(Two) * Log(Work3)) / Work3)
GetGausse = Work1 * Work3
' a second valid value will be returned by Work2 * Work3.
' Calculate it for the next pass. This will save some processing
dblReturn2 = Work2 * Work3
End If
blReturn2 = Not blReturn2 ' and toggle the return value flag
End Function

The images below are the typical output of a little sample app I threw together - it calls the above code 500000 times, and presents the results as a histogram.

The picture box used for display is 500 pixels wide, so the mean was defined as 250 and the sigma as 50 for horizontal scaling. For our purposes, any of the 'fliers' that fall outside the pixel range of 0 to 500 are discarded. Each of the horizontal grid line represent 250 samples which fell on that x-value. Note the non-random element (noise) that is evident in the right-hand histgram - this was caused by calling the Randomize sub internally, within the Get_Guasse function, so that VB's internal random number generator was reseeded every time the function ran.

__________________ Lou
"I have my standards. They may be low, but I have them!" ~ Bette Middler
"It's a book about a Spanish guy called Manual. You should read it." ~ Dilbert
"To understand recursion, you must first understand recursion." ~ unknown

Picking a random number from an arbtrary distribution

I have a proposed addition to the random numbers tutorial (Random Numbers. It includes an example PDF that has (1) an infinite domain, (2) a poorly behaved inverse, (3) a CDF that (sometimes) fails a gradient search for the inverse. However, this code can be used as a baseline for implementing less complicated PDFs.

********** Begin ***********

Picking a random number from an arbitrary distribution can be complicated but is necessary at times.

Consider VB’s built-in random number generator. The linear congruent generator, generates uniformly distributed numbers in the range [0,1]. It has a probability distribution function that is rectangular. None of our other PDF’s of interest will have this useful shape, so what we need is a one-to-one relationship between the LCG and our PDF. This relationship comes from the cumulative distribution function. The cumulative distribution tells us, for a random variable X distributed according to some PDF, the fraction of random numbers, x, that are less than or equal to X.

The nice thing about linear congruent generators is that when we pick a random number, x, it is the value of the CDF at x. More bluntly, if we choose a random number in the interval [0,1], say 0.21, then the fraction of random occurrences in the distribution that are less than 0.21 is 0.21.

Well, DUH! But what does that have to do with MY PDF? Our PDF has a cumulative distribution function as well (which is by definition monotonically increasing). The previous bit of fairly obvious information allows us to associate the LCG distribution with ours. If I pick a number x from distribution X and calculate the CDF of X at x and plug the result into the inverse of my CDF, I’ve effectively mapped Distribution X to my PDF.

Plain and simple… The following code will produce a random number, x, picked from any CDF. All you need is the inverse!

Code:

x = CDFInverse(Rnd)

Example:

I have a function that looks like: y = (x+1)*exp(-a*(x+1)) - exp(-a*(x+1)). I want to create a PDF that is shaped like this. The first thing we have to do is normalize it such that the total area under the curve is 1 (i.e. the sum of the likelihoods of each individual event is 100%). We do this by integrating the curve over the domain. For this function, my domain of interest is [0,infinity). Integrating it over this range by parts reveals that the area is 1 / (a^2 * exp(a)). So, I divide y by this value, thereby normalizing it:

Code:

y = ((x+1)*exp(-a*(x+1)) - exp(-a*(x+1))) * (a^2 * exp(a))

Now that I have a PDF, I need to calculate the associated CDF and invert it. CDF(z) is obtained by integrating from the beginning of the domain to z:

Code:

CDF(z) = 1 – exp(-a*z) – a*z*exp(-a*z)

This is a well-known form and is therefore easily inverted:

Code:

CDFInverse(x) = -(W((x-1) / exp(1)) + 1) / a

Where W(x) is the Lambert W-function (also known as the omega function). But now we have a problem. This solution is only valid over part of the range of the function. There are several ways around this. In this example I numerically compute the inverse using Newton’s Method (over the part of the range where it converges) and the bisection algorithm over the rest.

The following code should be able to assist you in using almost any PDF. Some PDFs (especially those with infinite domains) are not easy to implement. I gave this example just to show it's doable. In my example PDF, the parameter "a" is represented by the first parameter in the ParamArray passed to CDFInverse.

Code:

Option Explicit
Public Enum PDFs
CustomPDF1 = 0
End Enum
Public Type Bin
RightEdge As Double
Count As Long
End Type
Public Function CDFInverse(x As Double, PDF As PDFs, ParamArray Parameters()) As Double
Select Case PDF
Case CustomPDF1
'CDFInverse = (LambertW((x - 1) / Exp(1)) + 1) / Parameters(0)
CDFInverse = NewtonSolve(x, PDF, Parameters(0))
End Select
End Function
Public Function CDF(x As Double, PDF As PDFs, ParamArray Parameters()) As Double
Select Case PDF
Case CustomPDF1
CDF = 1 - Exp(-Parameters(0) * x) - Parameters(0) * x * Exp(-Parameters(0) * x)
End Select
End Function
Public Function NewtonSolve(ByVal x As Double, PDF As PDFs, ParamArray Parameters()) As Double
Dim Delta As Double, dx As Double, x1 As Double, x2 As Double, x3 As Double
Select Case PDF
Case CustomPDF1
Delta = 0.00000001
x1 = 2
'use newton's method
Do
dx = -(CDF(x1, PDF, Parameters(0)) - x) / DiffCDF(x1, PDF, Parameters(0))
x1 = x1 + dx
' This likes to diverge, so exit unless it's in a range where I know it
' will converge.
If Abs(dx / x1) < Delta Or x1 < 0 Or x1 > 10 Then Exit Do
Loop
If x1 < 0 Then
'newton diverged
'use bisection algorithm
x1 = 0
x2 = 10
Do Until CDF(x2, PDF, Parameters(0)) > 0
x2 = x2 * 2
Loop
x3 = x2 / 2
Do
x3 = CDF(x3, PDF, Parameters(0)) - x
If x3 > Delta Then
x2 = (x2 + x1) / 2
ElseIf x3 < -Delta Then
x1 = (x2 + x1) / 2
Else
Exit Do
End If
x3 = (x2 + x1) / 2
Loop
End If
End Select
NewtonSolve = x1
End Function
Public Function DiffCDF(x As Double, PDF As PDFs, ParamArray Parameters()) As Double
Dim Delta As Double
Delta = 0.00000001
'central difference derivative
Select Case PDF
Case CustomPDF1
DiffCDF = (CDF(x + Delta, PDF, Parameters(0)) - CDF(x - Delta, PDF, Parameters(0))) / 2 / Delta
End Select
End Function
Public Sub Main()
Dim bins(30) As Bin
Dim x As Double
Dim j As Long
Dim i As Long
Const NumTrials = 10000
For i = 0 To 29
bins(i).RightEdge = (i + 1) / 3
'Debug.Print CDFInverse(Rnd, 1)
Next
bins(30).RightEdge = 1.797E+308
For i = 1 To NumTrials
x = CDFInverse(Rnd, CustomPDF1, 1)
j = 0
While x > bins(j).RightEdge
j = j + 1
Wend
bins(j).Count = bins(j).Count + 1
Next
For i = 0 To 30
Debug.Print Format(bins(i).RightEdge, "0.000E+00"), bins(i).Count / NumTrials
Next
End Sub

You can add your own distributions to this by (1) adding a name for it into the enum and (2) creating an inverse for it. Directly invertible functions should be directly inverted.

Edit: Added a histogram comparing output to the actual distribution.

__________________
To err is human; to debug, divine.

Last edited by reboot; 09-20-2006 at 07:06 AM.
Reason: Fixed error per the author

The ASP.NET 2.0 Anthology
101 Essential Tips, Tricks & Hacks - Free 156 Page Preview. Learn the most practical features and best approaches for ASP.NET. subscribe