pythonnumpyscipynormal-distributionscipy.stats

Drawing sample and calculating sample probability from multivariate normal distribution using scipy.stats.multivariate_normal


I would like to do something that is likely very simple, but is giving me difficulty. Trying to draw N samples from a multivariate normal distribution and calculate the probability of each of those randomly drawn samples. Here I attempt to use scipy, but am open to using np.random.multivariate_normal as well. Whichever is easiest.

>>> import numpy as np
>>> from scipy.stats import multivariate_normal

>>> num_samples = 10
>>> num_features = 6
>>> std = np.random.rand(num_features)

# define distribution
>>> mvn = multivariate_normal(mean = np.zeros(num_features), cov = np.diag(std), allow_singular = False, seed = 42)

# draw samples
>>> sample = mvn.rvs(size = num_samples); sample

# determine probability of each drawn sample
>>> prob = mvn.pdf(x = sample)

# print samples
>>> print(sample)
[[ 0.04816243 -0.00740458 -0.00740406  0.04967142 -0.01382643  0.06476885]
...
 [-0.00977815  0.01047547  0.03084945  0.10309995  0.09312801 -0.08392175]]

# print probability all samples
[26861.56848337 17002.29353025  2182.26793265  3749.65049331
 42004.63147989  3700.70037411  5569.30332186 16103.44975393
 14760.64667235 19148.40325233]

This is confusing for me for a number of reasons:

Thanks!


Solution

  • I don't use the keyword arguments mean and cov per the docs... Am I missing something?

    No, what you are doing is allowed. The design of the distributions allows both calling the methods with parameters (as you read in the docs) and "freezing" the distribution with parameters and calling the methods without parameters. These are equivalent:

    mean = np.zeros(num_features)
    cov = np.diag(std)
    
    mvn = multivariate_normal(mean=mean, cov=cov, seed=42)
    sample = mvn.rvs(size=num_samples)
    pdf = mvn.pdf(sample)
    
    sample2 = multivariate_normal.rvs(mean=mean, cov=cov, size=num_samples, random_state=42)
    pdf2 = multivariate_normal.pdf(sample2, mean=mean, cov=cov)
    
    np.testing.assert_equal(sample2, sample)  # passes
    np.testing.assert_equal(pdf2, pdf)  # passes
    

    I would like to convert these numbers to approximate probabilities at that particular point. How can I do this?... I would like the compute the probability within a specific epsilon of the sample value.

    You can define a hypercube of side length eps centered at each point and evaluate the cumulative density within that hypercube (with SciPy 1.10.0+).

    eps = 0.01
    mvn.cdf(sample - eps/2, lower_limit=sample + eps/2)
    # array([2.87121214e-14, 1.81736055e-14, 2.33269634e-15, 4.00857084e-15,
    #        4.48976867e-14, 3.95613589e-15, 5.95304832e-15, 1.72140983e-14,
    #        1.57778144e-14, 2.04685939e-14])
    

    You can get approximately the same result by multiplying the probability density by the volume of the hypercube:

    vol = eps**num_features
    pdf * vol
    # array([2.87145307e-14, 1.81751442e-14, 2.33280494e-15, 4.00830854e-15,
    #        4.49021911e-14, 3.95598175e-15, 5.95348449e-15, 1.72142965e-14,
    #        1.57788643e-14, 2.04692967e-14])
    

    If you prefer a hyperspherical region, you can multiply by the volume of a hypersphere instead of that of a hypercube. For a 6-dimensional space with eps as the diameter of the hypersphere, vol = np.pi**3/6 * (eps/2)**6.