Is there a general way to join SciPy (or NumPy) probability distributions to create a mixture probability distribution which can then be sampled from?
I have such a distribution for display using something like:
mixture_gaussian = (norm.pdf(x_axis, -3, 1) + norm.pdf(x_axis, 3, 1)) / 2
which if then plotted looks like:
However, I can't sample from this generated model, as it's just a list of points which will plot as the curve.
Note, this specific distribution is just a simple example. I'd like to be able to generate several kinds of distributions (including "sub"-distributions which are not just normal distributions). Ideally, I would hope there would be someway for the function to be automatically normalized (i.e. not having to do the / 2
explicitly as in the code above.
Does SciPy/NumPy provide some way of easily accomplishing this?
This answer provides a way that such a sampling from a multiple distributions could be done, but it certainly requires a bit of handcrafting for a given mixture distribution, especially when wanting to weight different "sub"-distributions differently. This is usable, but I would hope for method that's a bit cleaner and straight forward if possible. Thanks!
Sampling from a mixture of distributions (where PDFs are added with some coefficients c_1, c_2, ... c_n) is equivalent to sampling each independently, and then, for each index, picking the value from k-th sample, with probability c_k.
The latter, mixing, step can be efficiently done with numpy.random.choice
. Here is an example where three distributions are mixed. The distributions are listed in distributions
, and their coefficients in coefficients
. There is a fat normal distribution, a uniform distribution, and a narrow normal distribution, with coefficients 0.5, 0.2, 0.3. The mixing happens at data[np.arange(sample_size), random_idx]
after random_idx
are generated according to given coefficients.
import numpy as np
import matplotlib.pyplot as plt
distributions = [
{"type": np.random.normal, "kwargs": {"loc": -3, "scale": 2}},
{"type": np.random.uniform, "kwargs": {"low": 4, "high": 6}},
{"type": np.random.normal, "kwargs": {"loc": 2, "scale": 1}},
]
coefficients = np.array([0.5, 0.2, 0.3])
coefficients /= coefficients.sum() # in case these did not add up to 1
sample_size = 100000
num_distr = len(distributions)
data = np.zeros((sample_size, num_distr))
for idx, distr in enumerate(distributions):
data[:, idx] = distr["type"](size=(sample_size,), **distr["kwargs"])
random_idx = np.random.choice(np.arange(num_distr), size=(sample_size,), p=coefficients)
sample = data[np.arange(sample_size), random_idx]
plt.hist(sample, bins=100, density=True)
plt.show()