pythonword2vecfasttext

Convert vector to a word with Fasttext


I made a model with a dataset with Fasttext and I can convert every word to a vector. But now I want to convert a vector to its unique word.

For example, I have this vector that is for ("the") word and I want to convert it to its word with my Fasttext model.

[-0.0193,  0.1951, -0.1819, -0.3403,  0.3106,  0.2078, -0.0274,
  0.0346, -0.0239,  0.1478, -0.0802, -0.0720,  0.2250,  0.0943,
 -0.0288, -0.0493,  0.1270, -0.0680, -0.1122,  0.0083, -0.0060,
  0.1109, -0.0454, -0.2186,  0.0731,  0.0368,  0.1594,  0.0640,
                        ....
 -0.1320,  0.2031,  0.1679, -0.0396, -0.2523, -0.0785, -0.0268,
  0.0182, -0.0330, -0.2324, -0.1024, -0.1578,  0.2445, -0.0421,
 -0.0757,  0.0089, -0.2211,  0.0022, -0.2253, -0.0776]

It's a (,300) dim vector. What should I do?


Solution

  • An origin vector oesn't necessarily have a unique word. Rather, FastText models can often report a ranked list of the words nearest your target vector. (Of course, if your vector does in fact match the word's vector exactly, it will be the 1st item in this ranked list.)

    I can't find a supported method in Facebook's fasttext Python wrapper for this - if it's there, it's not prominent in their docs.

    But in the alternative Python 'gensim' library – which can load FastText models from elsewhere – the method to use is .most_similar():

    https://radimrehurek.com/gensim/models/fasttext.html#gensim.models.fasttext.FastTextKeyedVectors.most_similar

    Specifically, it can take either words or vectors as the items in its positve & negative parameters. To get the nearest-neighbors of a single vector vec, you could use:

        ranked_neighbors = ft_model.most_similar(positive=[vec])
    

    The list you get back includes both words and their similarity-scores, so to just get the single best match (no matter how far it is from your target), you could use:

        top_hit = ranked_neighbors[0][0]