rrandom

With different seeds can Random States repeat in R


When I do simulation in R I often write the code such that there is a one to one mapping with a seed and a simulation run. Rather than specifying number of repetitions within a seed.

set.seed(1)
run_simulation()

set.seed(2)
run_simulation()

Compared to

set_seed(1)
run_simulation_ntimes(n)

Can it happen where the random state found in R using .Random.seed be the same for different seeds or have overlap such that random results would be identical to some degree?

For example hypothetically:

set.seed(1)
random_number(3)
# .341 .276 .58
set.seed(2)
random_number(3)
# .276 .58 .68

In this hypothetical .276 and .58 are identical random numbers from the same states between the two seeds.

I understand that two different random states can produce the same random number. Can two different seeds produce the same random states at least partially?


Solution

  • It is unlikely that different s values for set.seed(s) will produce the same random state, but that is not the only possible problem with the scheme you are using.

    If you call runif(n), then the n values you receive will appear to be independent under many tests. However, if you put runif(1) in a loop and generate the n values with sequential seeds, there is no reason to believe the n values you get will have a good approximation to independence.

    This is important, because many uses of n simulated values implicitly assume they will be independent. For example, if you want a confidence interval for the mean of the simulated value, the usual CI calculation assumes independence.

    I would guess that most simulations will be fine, but I'd also guess that some won't be, and I doubt if you will have any way to know if yours is okay or not. So I wouldn't do that.

    If you really want reproducibility of each individual simulation n, an easy but slow approach is to set the seed at the start, then run the full simulation n-1 times, ignoring the results, followed by the one that interests you. You can speed this up by saving the random seed state periodically, e.g. every 100 simulations, store .Random.seed to a file. This may use a lot of file space because the state takes a bit more than 2500 bytes, but saves time in that you only need to throw away fewer simulations to get to the one you want.