I trained a machine learning sentence classification model that uses, among other features, also the vectors obtained from a pretrained fastText model (like these) which is 7Gb. I use the pretrained fastText Italian model: I am using this word embedding only to get some semantic features to feed into the effective ML model.
I built a simple API based on fastText that, at prediction time, computes the vectors needed by the effective ML model. Under the hood, this API receives a string as input and calls get_sentence_vector
. When the API starts, it loads the fastText model into memory.
How can I reduce the memory footprint of fastText, which is loaded into RAM?
Constraints:
At the moment, I'm starting to experiment with compress-fasttext...
Please share your suggestions and thoughts even if they do not represent full-fledged solutions.
There is no easy solution for my specific problem: if you are using a fastText embedding as a feature extractor, and then you want to use a compressed version of this embedding, you have to retrain the final classifier, since produced vectors are somewhat different.
Anyway, I want to give a general answer for
You are using pretrained embeddings provided by Facebook or you trained your embeddings in an unsupervised fashion. Format .bin. Now you want to reduce model size/memory consumption.
Straight-forward solutions:
compress-fasttext library: compress fastText word embedding models by orders of magnitude, without significantly affecting their quality; there are also available several pretrained compressed models (other interesting compressed models here).
fastText native reduce_model
: in this case, you are reducing vector dimension (eg from 300 to 100), so you are explictly losing expressiveness; under the hood, this method employs PCA.
If you have training data and can perform retraining, you can use floret, a fastText fork by explosion (the company of Spacy), that uses a more compact representation for vectors.
If you are not interested in fastText ability to represent out-of-vocabulary words (words not seen during training), you can use .vec file (containing only vectors and not model weights) and select only a portion of the most common vectors (eg the first 200k words/vectors). If you need a way to convert .bin to .vec, read this answer. Note: gensim package fully supports fastText embedding (unsupervised mode), so these operations can be done through this library (more details in this answer)
You used fastText to train a classifier, producing a .bin model. Now you want to reduce classifier size/memory consumption.
quantize
: the model is retrained applying weights quantization and feature selection. With the retrain
parameter, you can decide whether to fine-tune the embeddings or not.reduce_model
, but it leads to less expressive models and the size of the model is not heavily reduced.