pythonnlpgensimglove

glove most similar to multiple words


I am supposed to do some exercises with python glove, most of it doesn't give me any problems but now i am supposed to find the 5 most similar words to "norway - war + peace" from the "glove-wiki-gigaword-100" package. But when i run my code it just says that the 'word' is not in the vocabulary. Now I'm guessing that this is some kind of formatting, but i don't know how to use it.

import gensim.downloader as api
model = api.load("glove-wiki-gigaword-100")  # download the model and return as object ready for use

bests = model.most_similar("norway - war + peace", topn= 5)

print("5 most similar words to 'norway - war + peace':")

for best in bests:
    print(best)

Solution

  • Gensim's model word2vec only deals with previously seen words. Here you give an entire sentence... What you want to do is:

    1. get vectors v1, v2 and v3 for resp. words "norway", "war" and "peace".
    2. Compute the math: v = v1 -v2 + v3.
    3. get the most_similar words to v.

    To do so, you will need these functions: model.wv.most_similar() and model.wv.similar_by_vector(). Note that model.wv.most_similar() does something similar to these three steps but in a more complicated way using a set of positive words and a set of negative words. See the documentation for details.