machine-learningwekanaivebayes

What does this output say exactly?


I'm using WEKA with "weather.arff" dataset and then I applied Naive-Bayes classifier with 10-fold cross validation as you can see in the given snapshot. I understand pretty much all except the things that I marked as red in the picture.

There are 9(Yes)+ 5(No) = 14 all together but here these sums exceed the total. And what is this yes(0.63) and No(0.38) mean? Are they related to the performance of the classifier after 10-fold CV?

outlook
  sunny             3.0     4.0
  overcast          5.0     1.0
  rainy             4.0     3.0
  [total]          12.0     8.0

This total here is 20.0, but we have 14 instances? what these each Sunny, Overcast, and rainy Yes and No counts? Where did they come from?

what is this weighted sum? How to calculate and how does that relates to NB?

Click Here to see the picture


Solution

  • I found the answer to my question. This problem is called "Zero Frequency Problem" and what WEKA does is that it adds up 1 to each attribute values. The reason is because to avoid 0 probabilities. Otherwise, when multiplying probabilities, the whole probability will become 0. In fact, having zero probability doesn't infer any new information about the case. In addition, It does not have to neither do with a number of "Cross Validation" iterations nor CV performance estimation.

    outlook                Yes            No
      sunny             (2+1)=3.0     (3+1)=4.0
      overcast          (4+1)=5.0     (0+1)=1.0
      rainy             (3+1)=4.0     (2+1)=3.0
      [total]             12.0           8.0
    

    Actual Instances = 9 + 5 = 14

    Another important thing is that WEKA does this to all the attributes, in this case to Overcast, Temperature, Humidity and Windy.