gensimword2vecembeddingglove

Is there a way to remove a word from a KeyedVectors vocab?


I need to remove an invalid word from the vocab of a "gensim.models.keyedvectors.Word2VecKeyedVectors".

I tried to remove it using del model.vocab[word], if I print the model.vocab the word disappeared, but when I run model.most_similar using other words the word that I deleted is still appearing as similar. So how can I delete a word from model.vocab in a way that affect the model.most_similar to not bring it?


Solution

  • There's no existing method supporting the removal of individual words.

    A quick-and-dirty workaround might be to, at the same time as removing the vocab entry, noting the index of the existing vector (in the underlying large vector array), and also changing the string in the kv_model.index2entity list at that index to some plug value (like say, '***DELETED***').

    Then, after performing any most_similar(), discard any entries matching '***DELETED***'.