pythonnumpystatisticslog-likelihood

finding log likelihood data using numpy


I am trying to use numpy to get the log likelihood for native bayes The following is the probability of getting 1 in each dimension when label is +1 and -1 repectively:

positive = [0.07973422 0.02657807]
negative = [0.04651163 0.02491694] #both of these have the dimension d

the following are the test and label for the test

x = np.array([[0,1],[1,0],[1,1]]) # dimension is n*d : note that the d is same as above
y = np.array([-1,1,-1]) #dimension is n

#result that I want

result = [-3.73983529 -2.55599409 -6.76026018] #dimension is n

logic-> each result element corresponds to a row in x which depends on what value of y to use to use the positive and negative

i.e.: for row 0, i.e. [0,1], the label -1, that means we take the posprob.

-3.73983529 = log( 1 - 0.04651163 ) + log(0.02491694)

, here we are subtracting from 1 because the probability of 0 is 1 minus probability of 1.

I am using tight loops right now. But I want to solve this using numpy methods to make it faster.


Solution

  • Cast everything to n x d and then use np.where.

    positive = [0.07973422, 0.02657807]
    negative = [0.04651163, 0.02491694]  # both of these have the dimension d
    
    x = np.array(
        [[0, 1], [1, 0], [1, 1]]
    )  # dimension is n*d : note that the d is same as above
    y = np.array([-1, 1, -1])  # dimension is n
    
    d = len(positive)
    n = len(x)
    
    # Cast all to n x d
    
    positive = np.array([positive]*n)
    negative = np.array([negative]*n)
    
    y = np.repeat(y, d).reshape(n, d)
    
    # Determine whether to use pos or neg probabilities
    pos_neg = np.where(y == 1, positive, negative)
    
    # Determine whether to use prob or 1-prob
    probs = np.where(x == 0, 1 - pos_neg, pos_neg)
    
    # Take logs and then sum
    log_probs = np.log(probs)
    
    log_like = np.sum(log_probs, axis = 1)
    
    print(log_like)