Given a list of document words e.g. [['cow','boy','hat','mat],['village','boy','water','cow']....]
, gensim can be used to get bi-grams as follows:
bigrams = gensim.models.Phrases(data_words, min_count=1,threshold=1)
bigram_model = gensim.models.phrases.Phraser(bigrams)
I was wondering as to how to get the score of each bi-gram detected in the bigram_model?
It turns out that it is as simple as using:
bigram_model.phrasegrams
that yields something like below:
{(b'cow', b'boy'): 23.3228613654742079,
(b'village', b'water'): 1.3228613654742079}