I built a spam classifier with random forest and wanted to make a separate function that can classify a text message to be spam or ham and I tried:
def predict_message(pred_text):
pred_text=[pred_text]
pred_text2 = tfidf_vect.fit_transform(pred_text)
pred_features = pd.DataFrame(pred_text2.toarray())
prediction = rf_model.predict(pred_features)
return (prediction)
pred_text = "how are you doing today?"
prediction = predict_message(pred_text)
print(prediction)
but it gives me the error:
The number of features of the model must match the input.
Model n_features is 7985 and input n_features is 1
I can't see the problem, how can I make it work?
By calling tfidf_vect.fit_transform(pred_text)
your vectorizer loses any information it had from your original training corpus.
You should just call transform
.
These changes below should help:
def predict_message(pred_text):
pred_text=[pred_text]
pred_text2 = tfidf_vect.transform(pred_text) # Changed
prediction = rf_model.predict(pred_text2)
return (prediction)
pred_text = "how are you doing today?"
prediction = predict_message(pred_text)
print(prediction)