I need to remove an invalid word from the vocab of a "gensim.models.keyedvectors.Word2VecKeyedVectors".
I tried to remove it using del model.vocab[word]
, if I print the model.vocab
the word disappeared, but when I run model.most_similar
using other words the word that I deleted is still appearing as similar.
So how can I delete a word from model.vocab
in a way that affect the model.most_similar
to not bring it?
There's no existing method supporting the removal of individual words.
A quick-and-dirty workaround might be to, at the same time as removing the vocab
entry, noting the index
of the existing vector (in the underlying large vector array), and also changing the string in the kv_model.index2entity
list at that index to some plug value (like say, '***DELETED***'
).
Then, after performing any most_similar()
, discard any entries matching '***DELETED***'
.