python machine-learning optimization nlp fasttext

Reduce fastText memory usage for big models

I trained a machine learning sentence classification model that uses, among other features, also the vectors obtained from a pretrained fastText model (like these) which is 7Gb. I use the pretrained fastText Italian model: I am using this word embedding only to get some semantic features to feed into the effective ML model.

I built a simple API based on fastText that, at prediction time, computes the vectors needed by the effective ML model. Under the hood, this API receives a string as input and calls get_sentence_vector. When the API starts, it loads the fastText model into memory.

How can I reduce the memory footprint of fastText, which is loaded into RAM?

Constraints:

My model works fine, training was time-consuming and expensive, so I wouldn't want to retrain it using smaller vectors
I need the fastText ability to handle out-of-vocabulary words, so I can't use just vectors but I need the full model
I should reduce the RAM usage, even at the expense of a reduction in speed.

At the moment, I'm starting to experiment with compress-fasttext...

Please share your suggestions and thoughts even if they do not represent full-fledged solutions.

Solution

There is no easy solution for my specific problem: if you are using a fastText embedding as a feature extractor, and then you want to use a compressed version of this embedding, you have to retrain the final classifier, since produced vectors are somewhat different.

Anyway, I want to give a general answer for

fastText models reduction

Unsupervised models (=embeddings)

You are using pretrained embeddings provided by Facebook or you trained your embeddings in an unsupervised fashion. Format .bin. Now you want to reduce model size/memory consumption.

Straight-forward solutions:

compress-fasttext library: compress fastText word embedding models by orders of magnitude, without significantly affecting their quality; there are also available several pretrained compressed models (other interesting compressed models here).
fastText native reduce_model: in this case, you are reducing vector dimension (eg from 300 to 100), so you are explictly losing expressiveness; under the hood, this method employs PCA.

If you have training data and can perform retraining, you can use floret, a fastText fork by explosion (the company of Spacy), that uses a more compact representation for vectors.

If you are not interested in fastText ability to represent out-of-vocabulary words (words not seen during training), you can use .vec file (containing only vectors and not model weights) and select only a portion of the most common vectors (eg the first 200k words/vectors). If you need a way to convert .bin to .vec, read this answer. Note: gensim package fully supports fastText embedding (unsupervised mode), so these operations can be done through this library (more details in this answer)

Supervised models

You used fastText to train a classifier, producing a .bin model. Now you want to reduce classifier size/memory consumption.

The best solution is fastText native quantize: the model is retrained applying weights quantization and feature selection. With the retrain parameter, you can decide whether to fine-tune the embeddings or not.
You can still use fastText reduce_model, but it leads to less expressive models and the size of the model is not heavily reduced.