pythonscipymontecarloscipy.stats

Problems with repeated draws of Sobol sequence in scipy


How do I get unique (uncorrelated) Sobol sequences by making repeated calls to random or random_base2?

I'm trying to repeatedly generate 'random' Sobol sequences, but I'm getting the same sequence each time. My reading of the documentation suggested that repeated calls would give me a different sequence ("To continue an existing design, extra points can be obtained by calling again random_base2.").

Here's my code:

import scipy
engine2 = scipy.stats.qmc.Sobol(d=1, scramble=True)
sample_qmc1 = engine2.random_base2(m=14)
print(sample_qmc1)
sample_qmc2 = engine2.random_base2(m=14)
print(sample_qmc2)

Here's example output:

[[0.2534843 ]
 [0.884218  ]
 [0.69113704]
 ...
 [0.69106254]
 [0.88420477]
 [0.25347106]]
[[0.25342032]
 [0.88416122]
 [0.69107275]
 ...
 [0.69111674]
 [0.88426601]
 [0.2535251 ]]

As you can see, the values are almost the same.

reset() doesn't help, I get the same sequence. Changing to random() doesn't help either. Using smaller sample sizes does seem to work.

If I add this line:

sample_qmc3 = engine2.random_base2(m=14)

I get this error:

ValueError: The balance properties of Sobol' points require n to be a power of 2. 24576 points have been previously generated, then: n=24576+2**14=40960. If you still want to do this, the function 'Sobol.random()' can be used.

I tried LatinHypercube and it doesn't have this problem.

I'm using scipy 1.14.1

Anyone go any ideas how I can generate unique Sobol sequences?


Solution

  • Here's how I solved the problem and an explanation of what I think is going on.

    I needed x sequences, so I did (for x=2):

        engine = scipy.stats.qmc.Sobol(d=2,
                                       scramble=True
                                       )
        random_x = engine.random(n=size).T.reshape(2, size)
    

    This gives me this kind of output:

    array([[0.30097613, 0.90872708, 0.57342084, ..., 0.57336791, 0.90871991,
            0.30101476],
           [0.73375382, 0.18831677, 0.93608302, ..., 0.26994206, 0.60370303,
            0.06809267]])
    

    The two sequences are uncorrelated.

    Here's what I think is happening. For every sequence to be uncorrelated with the other sequences, they have to have 'knowledge' of each other, this is hinted at in the documentation. So when you draw one sequence at a time, there's no knowledge of the prior sequences. When you draw them all at once (as dimensions) they have knowledge of each other, so they're uncorrelated.