I would like to fit a copula to a dataframe with 2 columns: a
and b
. Then I have to calculate the conditional probability of a
< 0, when b
<-2 (i.e. P(a<0|b<-1).
I have tried the following code in python using the library copula
; I am able to fit the copula to the data but I am not sure about calculating cdf :
import copula
df = pandas.read_csv("filename")
cop = copulas.multivariate.GaussianMultivariate()
cop.fit(df)
I know the function cdf
can calculate the conditional probability but I am not fully sure how to use that here.
The cdf method takes in an array of inputs and returns an array of the same shape, being the cumulative probability of each input value.
give a try to this code:
import numpy as np
# the array of inputs where b<-2 and a<0
inputs = np.array([[x, y] for x, y in zip(df['a'], df['b']) if y<-2 and x<0])
# Pass the inputs...
conditional_prob = cop.cdf(inputs)
another possible approach (a bit more formal, but longer)
# inputs
pdf = cop.pdf(inputs)
# pass the inputs where b < -2 to the copula's pdf method to calculate the probability density function of B
pdf_b = cop.pdf(np.array([[x, y] for x, y in zip(df['a'], df['b']) if y<-2]))
# Calculate P(A and B)
p_a_and_b = pdf * pdf_b
# Calculate P(B)
p_b = cop.cdf(np.array([[x, y] for x, y in zip(df['a'], df['b']) if y<-2]))
# Calculate P(A|B)
conditional_prob = p_a_and_b / p_b
let us know if it works for you. cheers.