I am trying to use numpy to get the log likelihood for native bayes The following is the probability of getting 1 in each dimension when label is +1 and -1 repectively:
positive = [0.07973422 0.02657807]
negative = [0.04651163 0.02491694] #both of these have the dimension d
the following are the test and label for the test
x = np.array([[0,1],[1,0],[1,1]]) # dimension is n*d : note that the d is same as above
y = np.array([-1,1,-1]) #dimension is n
#result that I want
result = [-3.73983529 -2.55599409 -6.76026018] #dimension is n
logic-> each result element corresponds to a row in x which depends on what value of y to use to use the positive and negative
i.e.: for row 0, i.e. [0,1], the label -1, that means we take the posprob.
-3.73983529 = log( 1 - 0.04651163 ) + log(0.02491694)
, here we are subtracting from 1 because the probability of 0 is 1 minus probability of 1.
I am using tight loops right now. But I want to solve this using numpy methods to make it faster.
Cast everything to n x d
and then use np.where
.
positive = [0.07973422, 0.02657807]
negative = [0.04651163, 0.02491694] # both of these have the dimension d
x = np.array(
[[0, 1], [1, 0], [1, 1]]
) # dimension is n*d : note that the d is same as above
y = np.array([-1, 1, -1]) # dimension is n
d = len(positive)
n = len(x)
# Cast all to n x d
positive = np.array([positive]*n)
negative = np.array([negative]*n)
y = np.repeat(y, d).reshape(n, d)
# Determine whether to use pos or neg probabilities
pos_neg = np.where(y == 1, positive, negative)
# Determine whether to use prob or 1-prob
probs = np.where(x == 0, 1 - pos_neg, pos_neg)
# Take logs and then sum
log_probs = np.log(probs)
log_like = np.sum(log_probs, axis = 1)
print(log_like)