pythonmachine-learningscikit-learnvalueerrorprecision-recall

Facing ValueError: Target is multiclass but average='binary'


I'm trying to use Naive Bayes algorithm for my dataset. I'm able to find out the accuracy but trying to find out precision and recall for the same. But, it is throwing the following error:

ValueError: Target is multiclass but average='binary'. Please choose another average setting.

Can anyone please suggest me how to proceed with it. I have tried using average ='micro' in the precision and the recall scores. It worked without any errors but it is giving the same score for accuracy, precision, recall.

My dataset:

train_data.csv:

review,label
Colors & clarity is superb,positive
Sadly the picture is not nearly as clear or bright as my 40 inch Samsung,negative

test_data.csv:

review,label
The picture is clear and beautiful,positive
Picture is not clear,negative

My code:

import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import precision_score
from sklearn.metrics import recall_score
from sklearn.metrics import confusion_matrix

X_train, y_train = pd.read_csv('train_data.csv')
X_test, y_test = pd.read_csv('test_data.csv')

vec = CountVectorizer() 
X_train_transformed = vec.fit_transform(X_train) 
X_test_transformed = vec.transform(X_test)

clf = MultinomialNB()
clf.fit(X_train_transformed, y_train)

score = clf.score(X_test_transformed, y_test)

y_pred = clf.predict(X_test_transformed)
cm = confusion_matrix(y_test, y_pred)

precision = precision_score(y_test, y_pred, pos_label='positive')
recall = recall_score(y_test, y_pred, pos_label='positive')

Solution

  • You need to add the 'average' param. According to the documentation:

    average : string, [None, ‘binary’ (default), ‘micro’, ‘macro’, ‘samples’, ‘weighted’]

    This parameter is required for multiclass/multilabel targets. If None, the scores for each class are returned. Otherwise, this determines the type of averaging performed on the data:

    Do this:

    print("Precision Score : ",precision_score(y_test, y_pred, 
                                               pos_label='positive'
                                               average='micro'))
    print("Recall Score : ",recall_score(y_test, y_pred, 
                                               pos_label='positive'
                                               average='micro'))
    

    Replace 'micro' with any one of the above options except 'binary'. Also, in the multiclass setting, there is no need to provide the 'pos_label' as it will be anyways ignored.

    Update for comment:

    Yes, they can be equal. Its given in the user guide here:

    Note that for “micro”-averaging in a multiclass setting with all labels included will produce equal precision, recall and F, while “weighted” averaging may produce an F-score that is not between precision and recall.