statisticsbayesiannaivebayesdocument-classification

Understanding Bayes' Theorem


I'm working on an implementation of a Naive Bayes Classifier. Programming Collective Intelligence introduces this subject by describing Bayes Theorem as:

Pr(A | B) = Pr(B | A) x Pr(A)/Pr(B)

As well as a specific example relevant to document classification:

Pr(Category | Document) = Pr(Document | Category) x Pr(Category) / Pr(Document)

I was hoping someone could explain to me the notation used here, what do Pr(A | B) and Pr(A) mean? It looks like some sort of function but then what does the pipe ("|") mean, etc?


Solution

  • But the above is with respect to the calculation of conditional probability. What you want is a classifier, which uses this principle to decide whether something belongs to a category based on the previous probability.

    See http://en.wikipedia.org/wiki/Naive_Bayes_classifier for a complete example