I've trained a sentiment analysis classifier using TripAdvisor's textual reviews datasets. It can predict the input textual reviews' rating based on sentiment. Everything is ok with the training and testing.
However, when I loaded the classifier in a new .ipynb file and tried to use a review for prediction, I get
NotFittedError: The TF-IDF vectorizer is not fitted** arises.
This is the detailed error:
---------------------------------------------------------------------------
NotFittedError Traceback (most recent call last)
/var/folders/rn/vqtp35xn15zd9d5scq3rxsth0000gn/T/ipykernel_71297/777236349.py in <module>
----> 1 prediction(test_str,HotelModel1000)
/var/folders/rn/vqtp35xn15zd9d5scq3rxsth0000gn/T/ipykernel_71297/1165328373.py in prediction(text, model)
4 cw = clean_string(text)
5 cw = tokenize(cw)
----> 6 cw = tfidf_vectorizer.transform([cw])
7 result = model.predict(cw)
8 print("Expected rating:",int(result))
~/opt/anaconda3/lib/python3.7/site-packages/sklearn/feature_extraction/text.py in transform(self, raw_documents, copy)
1869 Tf-idf-weighted document-term matrix.
1870 """
-> 1871 check_is_fitted(self, msg='The TF-IDF vectorizer is not fitted')
1872
1873 # FIXME Remove copy parameter support in 0.24
~/opt/anaconda3/lib/python3.7/site-packages/sklearn/utils/validation.py in inner_f(*args, **kwargs)
71 FutureWarning)
72 kwargs.update({k: arg for k, arg in zip(sig.parameters, args)})
---> 73 return f(**kwargs)
74 return inner_f
75
~/opt/anaconda3/lib/python3.7/site-packages/sklearn/utils/validation.py in check_is_fitted(estimator, attributes, msg, all_or_any)
1018
1019 if not attrs:
-> 1020 raise NotFittedError(msg % {'name': type(estimator).__name__})
1021
1022
NotFittedError: The TF-IDF vectorizer is not fitted
Here is my code to predict:
HotelModel = pickle.load(open('./models/TripAdvisorHotels_SVM_Model1000(2).pickle','rb'))
test_str = input('')
prediction(test_str,HotelModel)
Here is prediction() I called:
tfidf_vectorizer = TfidfVectorizer(max_features=5000,ngram_range=(2,2))
def prediction(text,model):
cw = clean_string(text)
cw = tokenize(cw)
cw = tfidf_vectorizer.transform([cw])
result = model.predict(cw)
print("Expected rating:",int(result))
print("\nThe confidence of the prediction is:",model.predict_proba(cw)[0][int(result)-1])
As mentioned in the comment,
you have correctly loaded the trained model from pickle file.
HotelModel = pickle.load(open('./models/TripAdvisorHotels_SVM_Model1000(2).pickle','rb'))
It can do the prediction because you saved the fitted
version.
Similarly, tfidf_vectorizer
also need the fitted
version.
You have to pickle the tfidf_vectorizer
fitted version, then load from pickle to use it.
If you are using the SVM based model, keep an eye with vectorizer length for fine tuning.