I have a problem using SciPy’s Quasi-Monte Carlo features for function estimation. I want to build survival functions for functions of several variables, so I'm using something like this:
Y = c1f1(X1) + c2f2(X2) + c3f3(X3)…
Where:
• c1, c2, c3 are constants
• f1, f2, f3… are functions (mostly linear, but not always)
• X1, X2, X3... are independent but not identically distributed random variables. In fact, the distributions are mostly different.
For “normal” Monte Carlo, I have the problem solved, I just generate random numbers with different distributions, and I put them into the equation above. But for Quasi-Monte Carlo, I’m stuck.
If the random variables were all the same distribution, I could do something like this (x_rvs being my random variables and there being no_dims of them):
engine = scipy.stats.qmc.Sobol(d=no_dims, scramble=True)
dist = scipy.stats.fisk(3.9)
base = scipy.stats.sampling.NumericalInverseHermite(dist)
x_rvs = base.qrvs(size=size, qmc_engine=engine)
I could then use my x_rvs. Problem is, my distributions are all different.
One option is to use uniform distributions over the range [0,1] and map them to other distributions. However, given that the QMC documentation makes a big deal of supporting different distributions, I can’t help feeling there’s a way of directly generating the random variables I need. I know I can’t just generate quasi-random data by using the QMC Sobol engine and using it repeatedly to generate different distributions (the sequences I get are too correlated if I do this and of no use), so I know I have to use multiple dimensions in my engine.
Anyone got any ideas for how to generate multiple quasi-random variables with different distributions that keep the nice properties of QMC?
However, given that the QMC documentation makes a big deal of supporting different distributions...
At this time, scipy.stats.qmc
does not have features for directly generating QMC sequences from any distributions other than the uniform, multinomial, and multivariate normal distributions.
You already found the qrvs
method of classes in scipy.stats.sampling
; that's the only other thing SciPy has in existing releases.
SciPy 1.15 will have a new random variable infrastructure. You will be able to use make_distribution
to convert (almost) any existing SciPy Continuous Distribution to the new interface, from which the sample
method will accept either a NumPy Generator
or SciPy QMCEngine
as argument for the rng
parameter. When it receives a QMCEngine
, it performs inverse transform sampling to generate QMC samples from other distributions. (I believe this what you were referring to with "One option is to use [uniform] distributions over the range [0,1] and map them to other distributions".) SciPy 1.15 RC1 is available now, and the relevant documentation is here.
I know I can’t just generate quasi-random data by using the QMC Sobol engine and using it repeatedly to generate different distributions (the sequences I get are too correlated if I do this and of no use)
Yes, good not to fall into that trap!
so I know I have to use multiple dimensions in my engine.
That is one option - you could create a multivariate uniform with one dimension per term in your sum, then use the coordinates of each dimension as an "independent" 1D, low-discrepancy sample.
import numpy as np
from scipy import stats
size = 2**10
rng = np.random.default_rng(235855178283348532492417763208635684641)
engine = stats.qmc.Sobol(d=2, scramble=True, seed=rng)
x_uniform = engine.random(size)
x_rvs1 = stats.fisk(3.9).ppf(x_uniform [:, 0])
x_rvs2 = stats.norm.ppf(x_uniform[:, 1])
x_rvs = x_rvs1 + x_rvs2
ecdf = stats.ecdf(x_rvs2)
ecdf.sf.plot()
Another (which may not be as low-discrepancy overall) is to create each QMCEngine
from a new Generator
with distinct seeds and scramble=True
. SciPy has a private _rng_spawn
function for this purpose. It is used in qmc_quad
to get (for most practical purposes) independent estimates of the integral from which it estimates the error. There was a discussion about this approach here.
So in SciPy 1.14 that might look like:
import numpy as np
from scipy import stats
def _rng_spawn(rng, n_children):
# https://github.com/scipy/scipy/blob/d916ddbab38468312fd0bee53ff68d5276f4f84f/scipy/_lib/_util.py#L1025
# spawns independent RNGs from a parent RNG
bg = rng._bit_generator
ss = bg._seed_seq
child_rngs = [np.random.Generator(type(bg)(child_ss))
for child_ss in ss.spawn(n_children)]
return child_rngs
size = 2**10
rng = np.random.default_rng(235855178283348532492417763208635684641) # high entropy seed generated with np.random.SeedSequence
rngs = _rng_spawn(rng, 2)
engine = stats.qmc.Sobol(d=1, scramble=True, seed=rngs[0])
dist = stats.fisk(3.9)
base = stats.sampling.NumericalInverseHermite(dist)
# Permute the order to avoid artifacts
x_rvs1 = np.random.permutation(base.qrvs(size=size, qmc_engine=engine))
engine = stats.qmc.Sobol(d=1, scramble=True, seed=rngs[1])
dist = stats.norm()
base = stats.sampling.NumericalInverseHermite(dist)
x_rvs2 = np.random.permutation(base.qrvs(size=size, qmc_engine=engine))
# Regular MC Samples
# x_rvs1 = stats.fisk(3.9).rvs(size=size)
# x_rvs2 = stats.norm.rvs(size=size)
x_rvs = x_rvs1 + x_rvs2
ecdf = stats.ecdf(x_rvs2)
ecdf.sf.plot()
For generating an empirical survival function, you might find scipy.stats.ecdf
useful (last two lines).
Both of these look a lot nicer than what you'd get with regular MC for a given number of points. For example, with 2**6
points: