I am trying to evaluate the multivariate normal distribution to get the probability that 1 < z1 < 2.178
and 2.178 < z2 < inf
. The mean of the distribution is zero, all variances are 1, and the covariance is 1/sqrt(2)
.
In R
, I get the correct results via:
library(mnormt)
sadmvn(lower=c(1, 2.178), upper=c(2.178, Inf), mean=0, varcov=matrix(c(1, 1/sqrt(2), 1/sqrt(2), 1),2, 2))
In SciPy, I am trying:
from scipy import stats
cov = np.array([[1, 1 / np.sqrt(2)], [1 / np.sqrt(2), 1]])
d = stats.multivariate_normal(mean=np.array([0, 0]), cov=cov)
d.cdf([2.178, np.inf]) - d.cdf([1, 2.178])
but this gives something around 0.14 instead of the correct result ~0.0082. How can I do this correctly?
You can use the lower_limit
parameter of the multivariate_normal.cdf
method to compute the CDF within a hyperrectangle with corners lower_limit
and x
(the first positional argument).
import numpy as np
from scipy import stats
cov = np.array([[1, 1 / np.sqrt(2)], [1 / np.sqrt(2), 1]])
d = stats.multivariate_normal(mean=np.array([0, 0]), cov=cov)
d.cdf([2.178, np.inf], lower_limit=[1, 2.178])
# 0.00820893693689204
It looks like lower_limit
and x
of multivariate_normal
correspond with lower
and upper
of sadmvn
, respectively.
The reason why your original approach did not work is that when cdf
is called without lower_limit
, the lower limit of integration it is assumed to be [-np.inf, -np.inf]
. Therefore, your code was calculating the difference between the CDFs of two hyperectangles, each with lower limit [-np.inf, -np.inf]
. This idea works in 1D, but in 2+ dimensions, this is not the same as the CDF within a single hyperrectangle from corner [1, 2.178]
to corner [2.178, np.inf]
.
For example, in the diagram below, the area enclosed within the rectangle defined by points A and B is not the same as the difference between the areas of the blue and red rectangles.