javaldatopic-modelingmallet

Java Mallet LDA keyword distributions


I have used Java-Mallet API for topic modelling with LDA. The API produce following results: topic : keyword1 (count), keyword2 (count)

For example

topic 0 : file (12423), test (3123) ... topic 1 : class (2415), test (314) ...

Is it right that topic 0 = file (12423/12423+3123 ....), test(3123/12423+3123).


Solution

  • That's one way to evaluate probabilities. You can also add a smoothing parameter (usually 0.01) to each value, and add 0.01 times the size of the vocabulary to the denominator to make it add up to 1.0.