pythonmachine-learningscikit-learntf-idftfidfvectorizer

NotFittedError: The TF-IDF vectorizer is not fitted


I've trained a sentiment analysis classifier using TripAdvisor's textual reviews datasets. It can predict the input textual reviews' rating based on sentiment. Everything is ok with the training and testing.

However, when I loaded the classifier in a new .ipynb file and tried to use a review for prediction, I get

 NotFittedError: The TF-IDF vectorizer is not fitted** arises. 

This is the detailed error:

---------------------------------------------------------------------------
NotFittedError                            Traceback (most recent call last)
/var/folders/rn/vqtp35xn15zd9d5scq3rxsth0000gn/T/ipykernel_71297/777236349.py in <module>
----> 1 prediction(test_str,HotelModel1000)

/var/folders/rn/vqtp35xn15zd9d5scq3rxsth0000gn/T/ipykernel_71297/1165328373.py in prediction(text, model)
      4     cw = clean_string(text)
      5     cw = tokenize(cw)
----> 6     cw = tfidf_vectorizer.transform([cw])
      7     result = model.predict(cw)
      8     print("Expected rating:",int(result))

~/opt/anaconda3/lib/python3.7/site-packages/sklearn/feature_extraction/text.py in transform(self, raw_documents, copy)
   1869             Tf-idf-weighted document-term matrix.
   1870         """
-> 1871         check_is_fitted(self, msg='The TF-IDF vectorizer is not fitted')
   1872 
   1873         # FIXME Remove copy parameter support in 0.24

~/opt/anaconda3/lib/python3.7/site-packages/sklearn/utils/validation.py in inner_f(*args, **kwargs)
     71                           FutureWarning)
     72         kwargs.update({k: arg for k, arg in zip(sig.parameters, args)})
---> 73         return f(**kwargs)
     74     return inner_f
     75 

~/opt/anaconda3/lib/python3.7/site-packages/sklearn/utils/validation.py in check_is_fitted(estimator, attributes, msg, all_or_any)
   1018 
   1019     if not attrs:
-> 1020         raise NotFittedError(msg % {'name': type(estimator).__name__})
   1021 
   1022 

NotFittedError: The TF-IDF vectorizer is not fitted

Here is my code to predict:

HotelModel = pickle.load(open('./models/TripAdvisorHotels_SVM_Model1000(2).pickle','rb'))
test_str = input('')
prediction(test_str,HotelModel)

Here is prediction() I called:

tfidf_vectorizer = TfidfVectorizer(max_features=5000,ngram_range=(2,2))

def prediction(text,model):
    cw = clean_string(text)
    cw = tokenize(cw)
    cw = tfidf_vectorizer.transform([cw])
    result = model.predict(cw)
    print("Expected rating:",int(result)) 
    print("\nThe confidence of the prediction is:",model.predict_proba(cw)[0][int(result)-1])

Solution

  • As mentioned in the comment,

    you have correctly loaded the trained model from pickle file.

    HotelModel = pickle.load(open('./models/TripAdvisorHotels_SVM_Model1000(2).pickle','rb'))
    

    It can do the prediction because you saved the fitted version.

    Similarly, tfidf_vectorizer also need the fitted version.

    You have to pickle the tfidf_vectorizer fitted version, then load from pickle to use it.

    If you are using the SVM based model, keep an eye with vectorizer length for fine tuning.