Is there an accepted solution to saving sklearn objects to json, instead of pickling them?
I'm interested in this because saving to json will take up much less storage and make saving the objects to db's like redis much more straightforward.
In particular, for something like a ColumnTransformer, all I need is the mean and std for a specific feature. With that, I can easily rebuild the transformer, but when reconstructing a transformer object from the saved json object, I have to manually set learned and private attributes, which feels hacky.
The closest thing I've found is this article: https://stackabuse.com/scikit-learn-save-and-restore-models/
Is this how others are going about this?
What is stopping sklearn from building this functionality into the library?
Think this package is what you are looking for https://pypi.org/project/sklearn-json/
Export scikit-learn model files to JSON for sharing or deploying predictive models with peace of mind.
This code snippet is from the link above and shows how to export sklearn models to json:
import sklearn_json as skljson
from sklearn.ensemble import RandomForestClassifier
model = RandomForestClassifier(n_estimators=10, max_depth=5, random_state=0).fit(X, y)
skljson.to_json(model, file_name)
deserialized_model = skljson.from_json(file_name)
deserialized_model.predict(X)
Furthermore to answer the json vs. pickle question, this might be helpful Pickle or json?