I want to use Mallet as a part of an expert finding project. I'm almost new to Mallet but I know that it trains topics from a set of the documents. Let's say that I have 50 topics trained by Mallet. I want to calculate this probability: p(topic|q)
or either p(q|topic)
q
is the query. It's a word (such as algorithm, android and etc) which I'm desired to find the experts in the specified area.
As I read this post : how to get word-topic probability using mallet, One of the users said we can calculate the probability using --word-topic-counts-file
option. Let's say that I have generated this file by Mallet. It has the following structure:
0 android 2:21
1 is 3:3
.
.
.
I know the semantic of this structure, But I don't know how can I calculate the probability of topic given query ( i.e. p(topic|q)
or either p(q|topic)
)
P.S: I use the word "either" because I'm not sure mallet calculates which of them
Any help would be appreciated
Take this example line from GlieBrt's answer to the linked question
1 needham 19:2 17:1
Here p(topic|q) can be calculated as
p(19|needham) = 2/3 = 0.67
and
p(17|needham) = 1/3 = 0.33
With you own example, it is even simpler:
0 android 2:21
p(2|android) = 1.0