How do I use BernoulliNB?

I'm trying to use BernoulliNB. Using the same data to train and to test, I get predictions other than the training data and probabilities other than 1. Why is that please?

import pandas as pd
from sklearn.naive_bayes import BernoulliNB
BNB = BernoulliNB()

# Data
df_1 = pd.DataFrame({'O' : [1,2,3,1,1,3,1,2,2,1],
                     'I1': [1,0,0,1,0,0,1,1,0,1],
                     'I2': [0,0,1,0,0,1,0,0,1,0],
                     'I3': [1,0,0,0,0,0,1,0,0,0]})

df_I = df_1.iloc[:,1:4]
S_O  = df_1['O']

# Bernoulli Naive Bayes Classifier
A_F = BNB.fit(df_I, S_O)
A_P = BNB.predict(df_I)
A_R = BNB.predict_proba(df_I)

df_P = pd.DataFrame(A_P)
df_R = pd.DataFrame(A_R)

df_P.columns = ['Predicted A']
df_R.columns = ['Prob 1', 'Prob 2', 'Prob 3']

df_1 = df_1.join(df_P)
df_1 = df_1.join(df_R)

Results

O   I1  I2  I3  Predicted A Prob 1  Prob 2  Prob 3
1   1   0   1   1           .80     .15     .05
2   0   0   0   2           .59     .33     .08
3   0   1   0   3           .18     .39     .43
1   1   0   0   1           .59     .33     .08
1   0   0   0   2           .59     .33     .08
3   0   1   0   3           .18     .39     .43
1   1   0   1   1           .80     .15     .48
2   1   0   0   1           .59     .33     .08
2   0   1   0   3           .18     .39     .43
1   1   0   0   1           .59     .33     .08

I have tried to describe what I am trying to do, here:

https://stats.stackexchange.com/questions/367829/how-probable-is-a-set

Solution

It's working correctly and you're using it right (code-wise). Predicted A is the predicted class label. In your case, the possible labels are defined by O and are 1,2,3 and Predicted A will always have values drawn from that set.

For probabilities, there is no guarantee they will be =1, in fact they almost never will be.

I think your confusion stems from the fact that you are feeding it known training data and yet the output is different? My guess is your training data is too small here, so it comes out slightly off. Feeding it more data will increase it's accuracy on this known training set.

I'll note that what you really want is to feed it a large known training set, and then predict an unknown test dataset. I can maybe get into the details of why that is, but I'd recommend reading tutorials on classifiers (scikit docs aren't bad, but any tutorial should cover this).

Code-wise, everything looks good to me.