I'm trying to import a model from my local but everytime I get the same error from gcp logs. The framework is scikit-learn
AttributeError: Can't get attribute 'preprocess_text' on <module 'model_server' from '/usr/app/model_server.py'>
The code snippet with this problem is
complaints_clf_pipeline = Pipeline(
[
("preprocess", text.TfidfVectorizer(preprocessor=utils.preprocess_text, ngram_range=(1, 2))),
("clf", naive_bayes.MultinomialNB(alpha=0.3)),
]
)
this
preprocess_text
comes from the cell above, but I keep receiving this issue with model_server which is not present on my code.
Can someone help?
I tried to refactor the code but got the same error, tried to undo this pipeline structure but then I got another error while trying to consult the model by API.
GCP is trying to load the model, but it can't find the preprocess_text
function because it's not included in the serialized model.
Save the scikit-learn pipeline, functions like preprocess_text are not automatically saved with the model. To ensure that GCP knows where to find this function, you can either:
Define preprocess_text
inside the same script where you're loading the model, or
Package utils as part of your deployment (including it in your GCP deployment files) so that the preprocess_text function is available in the same environment.
import pickle
from sklearn.pipeline import Pipeline
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB
class CustomTextClassifier:
def __init__(self):
self.pipeline = Pipeline(
[
("preprocess", TfidfVectorizer(preprocessor=self.preprocess_text, ngram_range=(1, 2))),
("clf", MultinomialNB(alpha=0.3)),
]
)
def preprocess_text(self, text):
return text.lower()
def train(self, X, y):
self.pipeline.fit(X, y)
def predict(self, X):
return self.pipeline.predict(X)
model = CustomTextClassifier()
# train model with your data...
with open('model.pkl', 'wb') as f:
pickle.dump(model, f)