I want to load an pre-trained embedding to initialize my own unsupervise FastText model and retrain with my dataset.
The trained embedding file I have loads fine with gensim.models.KeyedVectors.load_word2vec_format('model.txt')
. But when I try:
FastText.load_fasttext_format('model.txt')
I get: NotImplementedError: Supervised fastText models are not supported
.
Is there any way to convert supervised KeyedVectors to unsupervised FastText? And if possible, is it a bad idea?
I know that has an great difference between supervised and unsupervised models. But I really wanna try use/convert this and retrain it. I'm not finding a trained unsupervised model to load for my case (it's a portuguese dataset), and the best model I find is that
If your model.txt
file loads OK with KeyedVectors.load_word2vec_format('model.txt')
, then that's just a simple set of word-vectors. (That is, not a 'supervised' model.)
However, Gensim's FastText
doesn't support preloading a simple set of vectors for further training - for continued training, it needs a full FastText
model, either from Facebook's binary format, or a prior Gensim FastText
model .save()
.
(That trying to load a plain-vectors file generates that error suggests the load_fasttext_format()
method is momentarily mis-interpreting it as some other kind of binary FastText model it doesn't support.)
Update after comment below:
Of course you can mutate a model however you like, including ways not officially supported by Gensim. Whether that's helpful is another matter.
You can create an FT model with a compatible/overlapping vocabulary, load old word-vectors separately, then copy each prior vector over to replace the corresponding (randomly-initialized) vectors in the new model. (Note that the property to affect further training is actually ftModel.wv.vectors_vocab
trained-up full-word vectors, not the .vectors
which is composited from full-words & ngrams,)
But the tradeoffs of such an ad-hoc strategy are many. The ngrams would still start random. Taking some prior model's just-word vectors isn't quite the same as a FastText model's full-words-to-be-later-mixed-with-ngrams.
You'd want to make sure your new model's sense of word-frequencies is meaningful, as those affect further training - but that data isn't usually available with a plain-text prior word-vector set. (You could plausibly synthesize a good-enough set of frequencies by assuming a Zipf distribution.)
Your further training might get a "running start" from such initialization - but that wouldn't necessarily mean the end-vectors remain comparable to the starting ones. (All positions may be arbitrarily changed by the volume of newer training, progressively diluting away most of the prior influence.)
So: you'd be in an improvised/experimental setup, somewhat far from usual FastText practices and thus where you'd want to re-verify lots of assumptions, and rigorously evaluate if those extra steps/approximations are actually improving things.