pythongensimlda

How to get the score of filtered bi-grams in gensim?


Given a list of document words e.g. [['cow','boy','hat','mat],['village','boy','water','cow']....], gensim can be used to get bi-grams as follows:

bigrams = gensim.models.Phrases(data_words, min_count=1,threshold=1) 
bigram_model = gensim.models.phrases.Phraser(bigrams)

I was wondering as to how to get the score of each bi-gram detected in the bigram_model?


Solution

  • It turns out that it is as simple as using:

    bigram_model.phrasegrams
    

    that yields something like below:

    {(b'cow', b'boy'): 23.3228613654742079,
     (b'village', b'water'): 1.3228613654742079}