pythoncdfprobability-distribution

How to fit and calculate conditional probability of copula in Python


I would like to fit a copula to a dataframe with 2 columns: a and b. Then I have to calculate the conditional probability of a < 0, when b<-2 (i.e. P(a<0|b<-1).

I have tried the following code in python using the library copula; I am able to fit the copula to the data but I am not sure about calculating cdf :

import copula
df = pandas.read_csv("filename")  
cop = copulas.multivariate.GaussianMultivariate()
cop.fit(df)

I know the function cdf can calculate the conditional probability but I am not fully sure how to use that here.


Solution

  • The cdf method takes in an array of inputs and returns an array of the same shape, being the cumulative probability of each input value.

    give a try to this code:

    import numpy as np
    
    # the array of inputs where b<-2 and a<0
    inputs = np.array([[x, y] for x, y in zip(df['a'], df['b']) if y<-2 and x<0])
    
    # Pass the inputs...
    conditional_prob = cop.cdf(inputs)
    

    another possible approach (a bit more formal, but longer)

    # inputs
    pdf = cop.pdf(inputs)
    
    # pass the inputs where b < -2 to the copula's pdf method to calculate the probability density function of B
    pdf_b = cop.pdf(np.array([[x, y] for x, y in zip(df['a'], df['b']) if y<-2]))
    
    # Calculate P(A and B)
    p_a_and_b = pdf * pdf_b
    
    # Calculate P(B)
    p_b = cop.cdf(np.array([[x, y] for x, y in zip(df['a'], df['b']) if y<-2]))
    
    # Calculate P(A|B)
    conditional_prob = p_a_and_b / p_b
    

    let us know if it works for you. cheers.