machine-learninginformation-retrievaltopic-modelingmallettopicmodels

How to get probability of topic given a query using Mallet


I want to use Mallet as a part of an expert finding project. I'm almost new to Mallet but I know that it trains topics from a set of the documents. Let's say that I have 50 topics trained by Mallet. I want to calculate this probability: p(topic|q) or either p(q|topic)

q is the query. It's a word (such as algorithm, android and etc) which I'm desired to find the experts in the specified area.

As I read this post : how to get word-topic probability using mallet, One of the users said we can calculate the probability using --word-topic-counts-file option. Let's say that I have generated this file by Mallet. It has the following structure:

0 android 2:21
1 is 3:3
.
.
.

I know the semantic of this structure, But I don't know how can I calculate the probability of topic given query ( i.e. p(topic|q) or either p(q|topic) )

P.S: I use the word "either" because I'm not sure mallet calculates which of them

Any help would be appreciated


Solution

  • Take this example line from GlieBrt's answer to the linked question

    1 needham 19:2 17:1
    

    Here p(topic|q) can be calculated as

    p(19|needham) = 2/3 = 0.67

    and

    p(17|needham) = 1/3 = 0.33

    With you own example, it is even simpler:

    0 android 2:21
    

    p(2|android) = 1.0