pythonscikit-learnmetricsmulticlass-classification

Precision, Recall and F1 with Sklearn for a Multiclass problem


I have a Multiclass problem, where 0 is my negative class and 1 and 2 are positive. Check the following code:

import numpy as np
from sklearn.metrics import confusion_matrix
from sklearn.metrics import ConfusionMatrixDisplay
from sklearn.metrics import f1_score
from sklearn.metrics import precision_score
from sklearn.metrics import recall_score

# Outputs
y_true = np.array((1, 2, 2, 0, 1, 0))
y_pred = np.array((1, 0, 0, 0, 0, 1))
# Metrics
precision_macro = precision_score(y_true, y_pred, average='macro')
precision_weighted = precision_score(y_true, y_pred, average='weighted')
recall_macro = recall_score(y_true, y_pred, average='macro')
recall_weighted = recall_score(y_true, y_pred, average='weighted')
f1_macro = f1_score(y_true, y_pred, average='macro')
f1_weighted = f1_score(y_true, y_pred, average='weighted')
# Confusion Matrix
cm = confusion_matrix(y_true, y_pred)
disp = ConfusionMatrixDisplay(confusion_matrix=cm)
disp.plot()
plt.show()

The metrics calculated with Sklearn in this case are the following:

precision_macro = 0.25
precision_weighted = 0.25
recall_macro = 0.33333
recall_weighted = 0.33333
f1_macro = 0.27778
f1_weighted = 0.27778

And this is the confusion matrix:

enter image description here

The macro and weighted are the same because i have the same number of samples for each class? This is what i did manually.

1 - Precision = TP/(TP+FP). So for classes 1 and 2, we get:

Precision1 = TP1/(TP1+FP1) = 1/(1+1) = 0.5
Precision2 = TP2/(TP2+FP2) = 0/(0+0) = 0 (this returns 0 according Sklearn documentation)
Precision_Macro = (Precision1 + Precision2)/2 = 0.25
Precision_Weighted = (2*Precision1 + 2*Precision2)/4 = 0.25

2 - Recall = TP/(TP+FN). So for classes 1 and 2, we get:

Recall1 = TP1/(TP1+FN1) = 1/(1+1) = 0.5
Recall2 = TP2/(TP2+FN2) = 0/(0+2) = 0
Recall_Macro = (Recall1+Recall2)/2 = (0.5+0)/2 = 0.25
Recall_Weighted = (2*Recall1+2*Recall2)/4 = (2*0.5+2*0)/4 = 0.25

3 - F1 = 2*(Precision*Recall)/(Precision+Recall)

F1_Macro = 2*(Precision_Macro*Recall_Macro)/(Precision_Macro*Recall_Macro) = 0.25
F1_Weighted = 2*(Precision_Weighted*Recall_Weighted)/(Precision_Weighted*Recall_Weighted) = 0.25

So, the Precision score is the same as Sklearn. But Recall and F1 are different. What did i do wrong here? Even if you use the values of Precision and Recall from Sklearn (i.e., 0.25 and 0.3333), you can't get the 0.27778 F1 score.


Solution

  • For the averaged scores, you need also the score for class 0. The precision of class 0 is 1/4 (so the average doesn't change). The recall of class 0 is 1/2, so the average recall is (1/2+1/2+0)/3 = 1/3.

    The average F1 score is not the harmonic-mean of average precision & recall; rather, it is the average of the F1's for each class. Here, F1 for class 0 is 1/3, for class 1 is 1/2, and for class 2 undefined but taken to be 0, for an average of 5/18.