pythonnlpword2vecgensimword-sense-disambiguation

How do I find a synonym of a word or multi-word paraphrase using the gensim toolkit


Having loaded a pre-trained word2vec model with the gensim toolkit, I would like to find a synonym of a word given a context such as intelligent for 'she is a bright person'.


Solution

  • There's a method [most_similar()][1] that will report the words of the closest vectors, by cosine-similarity in the model's coordinates, to a given word. For example:

    similars = loaded_w2v_model.most_similar('bright')
    

    However, Word2vec won't find strictly synonyms – just words that were contextually-related in its training-corpus. These are often synonym-like, but also can be similar in other ways – such as used in the same topical domains, or able to replace each other functionally. (In that last respect, sometimes the highly-similar word-vectors are for antonyms, because words like 'hot' and 'cold' appear in the same places, referring the the same aspect of something.)

    Plain word2vec also doesn't deal with polysemy (that a token like 'bright' is both a word for 'well-lit' and a word for 'smart') well. So the list of most-similar words for 'bright' will include a mix from its alternate senses.