First some background on my situation:
I need a random triangular distribution and was planning on using Python's random.triangular. The following is the source code (Python 3.6.2):
def triangular(self, low=0.0, high=1.0, mode=None):
"""Triangular distribution.
Continuous distribution bounded by given lower and upper limits,
and having a given mode value in-between.
http://en.wikipedia.org/wiki/Triangular_distribution
"""
u = self.random()
try:
c = 0.5 if mode is None else (mode - low) / (high - low)
except ZeroDivisionError:
return low
if u > c:
u = 1.0 - u
c = 1.0 - c
low, high = high, low
return low + (high - low) * (u * c) ** 0.5
I reviewed the referenced wiki page and found that my desired use had a special case which simplifies things, and can be implemented with the following function:
def random_absolute_difference():
return abs(random.random() - random.random())
Doing some quick timings reveals a significant speedup with the simplified version (this operation will be repeated far more than a million times each time my code runs):
>>> import timeit
>>> timeit.Timer('random.triangular(mode=0)','import random').timeit()
0.5533245000001443
>>> timeit.Timer('abs(random.random()-random.random())','import random').timeit()
0.16867640000009487
So now for the question: I know python's random module only uses pseudo-randomness, and random.triangular uses one random number while the special case code uses 2 random numbers. Will the special case results be significantly less random because they use 2 consecutive calls to random, while random.triangular only uses one? Are there any other unforeseen side effects of using the simplified code?
Edit: In reference to this solution to a different question, I created histogram plots for both distributions, showing that they are comparable:
In your case, triangular
boils down to the following expression:
1 + (0 - 1) * ((1.0 - u) * (1.0 - c)) ** 0.5
And then further to:
1 - 1 * ((1.0 - u) * 1.0) ** 0.5
And then further to:
1 - (1.0 - u) ** 0.5
And with my timings, this last expression runs much faster than random.triangular(mode=0)
and has comparable speed to abs(random.random()-random.random())
. Note that triangular
contains a try/except statement, which may explain some of the performance difference (for example, replace that statement with just "mode = 0" and see).
import timeit
timeit.Timer('random.triangular(mode=0)','import random').timeit()
timeit.Timer('1 - (1.0 - random.random()) ** 0.5','import random').timeit()
timeit.Timer('abs(random.random()-random.random())','import random').timeit()
However, I don't see a reason why using two random numbers instead of one will produce a "less random" triangular-distributed number -- as long as the two methods produce the same distribution. In fact, using two random numbers will give you a greater variety of triangular-distributed numbers than just one alone, since there are more bits of randomness available for this purpose. (In case you want to test the two methods for correctness, you can do so using the Kolmogorov–Smirnov test along with the CDF of the triangular distribution, since the triangular distribution is absolutely continuous. This test is implemented, for example, in SciPy under scipy.stats.kstest
. If several runs of the test return a p-value extremely close to 0, that strongly indicates that the numbers come from the wrong distribution.)