[SOLVED] finding log likelihood data using numpy

finding log likelihood data using numpy

I am trying to use numpy to get the log likelihood for native bayes The following is the probability of getting 1 in each dimension when label is +1 and -1 repectively:

positive = [0.07973422 0.02657807]
negative = [0.04651163 0.02491694] #both of these have the dimension d

the following are the test and label for the test

x = np.array([[0,1],[1,0],[1,1]]) # dimension is n*d : note that the d is same as above
y = np.array([-1,1,-1]) #dimension is n

#result that I want

result = [-3.73983529 -2.55599409 -6.76026018] #dimension is n

logic-> each result element corresponds to a row in x which depends on what value of y to use to use the positive and negative

i.e.: for row 0, i.e. [0,1], the label -1, that means we take the posprob.

-3.73983529 = log( 1 - 0.04651163 ) + log(0.02491694)

, here we are subtracting from 1 because the probability of 0 is 1 minus probability of 1.

I am using tight loops right now. But I want to solve this using numpy methods to make it faster.

Solution

Cast everything to n x d and then use np.where.

positive = [0.07973422, 0.02657807]
negative = [0.04651163, 0.02491694]  # both of these have the dimension d

x = np.array(
    [[0, 1], [1, 0], [1, 1]]
)  # dimension is n*d : note that the d is same as above
y = np.array([-1, 1, -1])  # dimension is n

d = len(positive)
n = len(x)

# Cast all to n x d

positive = np.array([positive]*n)
negative = np.array([negative]*n)

y = np.repeat(y, d).reshape(n, d)

# Determine whether to use pos or neg probabilities
pos_neg = np.where(y == 1, positive, negative)

# Determine whether to use prob or 1-prob
probs = np.where(x == 0, 1 - pos_neg, pos_neg)

# Take logs and then sum
log_probs = np.log(probs)

log_like = np.sum(log_probs, axis = 1)

print(log_like)