I just trained my fasttext model and I am trying to pin it using pins https://pypi.org/project/pins/ and vetiver https://pypi.org/project/vetiver/ for version control.
However, for that to happen I need to pickle the fasttext object/model. And that is where I am struggling.
PS: When I save the fasttext model to disk, it saves as a .bin or a binary file.
Here is how the code looks, when using pins:
import pins
import fasttext
board = pins.board_temp(allow_pickle_read = True)
board.pin_write(ft_model, "ft_model", type="joblib") #ft_model is a fasttext model I already trained
The error code I get for running these ^ lines is :
cannot pickle 'fasttext_pybind.fasttext' object
The same happens when I use vetiver:
import vetiver
import fasttext
import pins
class FasttextHandler(BaseHandler):
def __init__(self, model, ptype_data):
super().__init__(model, ptype_data)
handled_model = FasttextHandler(model = ft_model, ptype_data = None )
vetiver_fasttext_model = vetiver.VetiverModel(model = handled_model, model_name = "model")
ft_board = board_temp(allow_pickle_read = True)
vetiver.vetiver_pin_write(ft_board, vetiver_fasttext_model)
Again, the error code I get for this snippet ^ of code is cannot pickle 'fasttext_pybind.fasttext' object
I appreciate any help or any tips,
Thank you kindly!
Jamal
The official Facebook fasttext
module relies on Facebook's non-Python implementation, and storage format – so that's likely the pickle-resistant barrier you're hitting.
If you're not using the --supervised
classification mode, the completely Python & Cython Gensim library includes a FastText
model class which does everything except that mode. It can also load/save Facebook-format models.
While Gensim's own native .save()
operations uses a mixture of pickling & raw numpy array files, for historic & efficiency reasons, its models should also be amenable to complete pickling (if using recent Pythons & otherwise your project is OK with the full overhead).
If you still need features from the Facebook fasttext
like the supervised-mode, you might have to wrap their native objects, with unpickleable parts, with proxy objects that intercept pickle-serialization attempts and somehow leverage their custom formats to simulate pickle-ability.
For example, on serialization, ask the wrapped object to write itself in its usual way, then pickle-serialize the entire raw native file as one serialized raw-data field of your wrapper object. On deserialization, explicitly take that giant raw file field, write it to disk, then use the wrapped class's native load.
It'd be rather slow & ugly, and involve a large amount of extra temporary addessable memory usage during marshalling between the two serialization formats - but perhaps if you have no other option, & your systems have enough tolerance for the delay/memory-usage, it would let you use native fasttext
models in your desired pins
/vetiver
-based architecture.