pythonpicklebinaryfilesfasttextvetiver

Is there a way to pickle a fasttext model/object?


I just trained my fasttext model and I am trying to pin it using pins https://pypi.org/project/pins/ and vetiver https://pypi.org/project/vetiver/ for version control.
However, for that to happen I need to pickle the fasttext object/model. And that is where I am struggling.
PS: When I save the fasttext model to disk, it saves as a .bin or a binary file. Here is how the code looks, when using pins:

import pins
import fasttext
board = pins.board_temp(allow_pickle_read = True)
board.pin_write(ft_model, "ft_model", type="joblib")  #ft_model is a fasttext model I already trained

The error code I get for running these ^ lines is : cannot pickle 'fasttext_pybind.fasttext' object

The same happens when I use vetiver:

import vetiver
import fasttext
import pins

class FasttextHandler(BaseHandler):
    def __init__(self, model, ptype_data):
        super().__init__(model, ptype_data)

handled_model = FasttextHandler(model = ft_model, ptype_data = None )
vetiver_fasttext_model = vetiver.VetiverModel(model = handled_model, model_name = "model")
ft_board = board_temp(allow_pickle_read = True)
vetiver.vetiver_pin_write(ft_board, vetiver_fasttext_model)

Again, the error code I get for this snippet ^ of code is cannot pickle 'fasttext_pybind.fasttext' object

I appreciate any help or any tips,

Thank you kindly!

Jamal


Solution

  • The official Facebook fasttext module relies on Facebook's non-Python implementation, and storage format – so that's likely the pickle-resistant barrier you're hitting.

    If you're not using the --supervised classification mode, the completely Python & Cython Gensim library includes a FastText model class which does everything except that mode. It can also load/save Facebook-format models.

    While Gensim's own native .save() operations uses a mixture of pickling & raw numpy array files, for historic & efficiency reasons, its models should also be amenable to complete pickling (if using recent Pythons & otherwise your project is OK with the full overhead).

    If you still need features from the Facebook fasttext like the supervised-mode, you might have to wrap their native objects, with unpickleable parts, with proxy objects that intercept pickle-serialization attempts and somehow leverage their custom formats to simulate pickle-ability.

    For example, on serialization, ask the wrapped object to write itself in its usual way, then pickle-serialize the entire raw native file as one serialized raw-data field of your wrapper object. On deserialization, explicitly take that giant raw file field, write it to disk, then use the wrapped class's native load.

    It'd be rather slow & ugly, and involve a large amount of extra temporary addessable memory usage during marshalling between the two serialization formats - but perhaps if you have no other option, & your systems have enough tolerance for the delay/memory-usage, it would let you use native fasttext models in your desired pins/vetiver-based architecture.