pythonmachine-learningscikit-learnprecision-recall

UndefinedMetricWarning: Recall and F-score are ill-defined and being set to 0.0 in labels with no true samples. 'recall', 'true', average, warn_for)


When I use the following code to calculate precision_recall_fscore_support for one-class ( only the 1s)

import numpy as np
from sklearn.metrics import precision_recall_fscore_support

#make arrays
ytrue = np.array(['1', '1', '1', '1', '1','1','1','1'])
ypred = np.array(['0', '0', '0', '1', '1','1','1','1'])

#keep only 1
y_true, y_pred = zip(*[[ytrue[i], ypred[i]] for i in range(len(ytrue)) if ytrue[i]=="1"])

#get scores
precision_recall_fscore_support(y_true, y_pred, average='weighted')

I get the following Warning:

UndefinedMetricWarning: Recall and F-score are ill-defined and being set to 0.0 in labels with no true samples.
  'recall', 'true', average, warn_for)

and output:

(1.0, 0.625, 0.76923076923076927, None)

I found the SO thread UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in labels with no predicted samples that has similar warning, but I don't think it applies to my problem.

Question: Are the results of my output valid or should I be concerned about the warning message? If so, what is wrong with my code and how to fix?


Solution

  • You need to use:

    cv = ShuffleSplit(n_splits=10, test_size=0.3, random_state=0)
    

    I'm using knn and this solved the problem

    Code:

    def knn(self,X_train,X_test,Y_train,Y_test):
    
       #implementación del algoritmo
       knn = KNeighborsClassifier(n_neighbors=3).fit(X_train,Y_train)
       #10XV
       cv = ShuffleSplit(n_splits=10, test_size=0.3, random_state=0)
       puntajes = sum(cross_val_score(knn, X_test, Y_test, 
                                            cv=cv,scoring='f1_weighted'))/10
       
       print(puntajes)
    

    Documentation: https://scikit-learn.org/stable/modules/cross_validation.html