I am trying to load my saved model from s3
using joblib
import pandas as pd
import numpy as np
import json
import subprocess
import sqlalchemy
from sklearn.externals import joblib
ENV = 'dev'
model_d2v = load_d2v('model_d2v_version_002', ENV)
def load_d2v(fname, env):
model_name = fname
if env == 'dev':
try:
model=joblib.load(model_name)
except:
s3_base_path='s3://sd-flikku/datalake/doc2vec_model'
path = s3_base_path+'/'+model_name
command = "aws s3 cp {} {}".format(path,model_name).split()
print('loading...'+model_name)
subprocess.call(command)
model=joblib.load(model_name)
else:
s3_base_path='s3://sd-flikku/datalake/doc2vec_model'
path = s3_base_path+'/'+model_name
command = "aws s3 cp {} {}".format(path,model_name).split()
print('loading...'+model_name)
subprocess.call(command)
model=joblib.load(model_name)
return model
But I get this error:
from sklearn.externals import joblib
ImportError: cannot import name 'joblib' from 'sklearn.externals' (C:\Users\prane\AppData\Local\Programs\Python\Python37\lib\site-packages\sklearn\externals\__init__.py)
Then I tried installing joblib
directly by doing
import joblib
but it gave me this error
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 8, in load_d2v_from_s3
File "/home/ec2-user/.local/lib/python3.7/site-packages/joblib/numpy_pickle.py", line 585, in load
obj = _unpickle(fobj, filename, mmap_mode)
File "/home/ec2-user/.local/lib/python3.7/site-packages/joblib/numpy_pickle.py", line 504, in _unpickle
obj = unpickler.load()
File "/usr/lib64/python3.7/pickle.py", line 1088, in load
dispatch[key[0]](self)
File "/usr/lib64/python3.7/pickle.py", line 1376, in load_global
klass = self.find_class(module, name)
File "/usr/lib64/python3.7/pickle.py", line 1426, in find_class
__import__(module, level=0)
ModuleNotFoundError: No module named 'sklearn.externals.joblib'
Can you tell me how to solve this?
It looks like your existing pickle save file (model_d2v_version_002
) encodes a reference module in a non-standard location – a joblib
that's in sklearn.externals.joblib
rather than at top-level.
The current scikit-learn
documentation only talks about a top-level joblib
– eg in 3.4.1 Persistence example – but I do see a reference in someone else's old issue to a DeprecationWarning in scikit-learn
version 0.21 about an older scikit.external.joblib
variant going away:
Python37\lib\site-packages\sklearn\externals\joblib_init_.py:15: DeprecationWarning: sklearn.externals.joblib is deprecated in 0.21 and will be removed in 0.23. Please import this functionality directly from joblib, which can be installed with: pip install joblib. If this warning is raised when loading pickled models, you may need to re-serialize those models with scikit-learn 0.21+.
'Deprecation' means marking something as inadvisable to rely-upon, as it is likely to be discontinued in a future release (often, but not always, with a recommended newer way to do the same thing).
I suspect your model_d2v_version_002
file was saved from an older version of scikit-learn
, and you're now using scikit-learn
(aka sklearn
) version 0.23+ which has totally removed the sklearn.external.joblib
variation. Thus your file can't be directly or easily loaded to your current environment.
But, per the DeprecationWarning
, you can probably temporarily use an older scikit-learn
version to load the file the old way once, then re-save it with the now-preferred way. Given the warning info, this would probably require scikit-learn
version 0.21.x or 0.22.x, but if you know exactly which version your model_d2v_version_002
file was saved from, I'd try to use that. The steps would roughly be:
create a temporary working environment (or roll back your current working environment) with the older sklearn
do imports something like:
import sklearn.external.joblib as extjoblib
import joblib
extjoblib.load()
your old file as you'd planned, but then immediately re-joblib.dump()
the file using the top-level joblib
. (You likely want to use a distinct name, to keep the older file around, just in case.)
move/update to your real, modern environment, and only import joblib
(top level) to use joblib.load()
- no longer having any references to `sklearn.external.joblib' in either your code, or your stored pickle files.