rdplyrr-caretconfusion-matrixyardstick

how to pass a tibble to caret::confusionmatrix()?


Consider this simple example:

data_frame(truth = c(1,1,0,0),
           prediction = c(1,0,1,0),
           n_obs = c(100,10,90,50))
# A tibble: 4 x 3
  truth prediction n_obs
  <dbl>      <dbl> <dbl>
1     1          1   100
2     1          0    10
3     0          1    90
4     0          0    50

I would like to pass this tibble to caret::confusionMatrix so that I have all the metrics I need at once (accuracy, recall, etc).

As you can see, the tibble contains all the information required to compute performance statistics. For instance, you can see that in the test dataset (not available here), there are 100 observations where the predicted label 1 matched the true label 1. However, 90 observations with a predicted value of 1 were actually false positives.

I do not want to compute all the metrics by hand, and would like to resort to caret::confusionMatrix()

However, this has proven to be suprisingly difficult. Calling confusionMatrix(.) on the tibble above does not work. Is there any solution here?

Thanks!


Solution

  • You could use the following. You have to set the positive class to 1 otherwise 0 will be taken as the positive class.

    confusionMatrix(xtabs(n_obs ~ prediction + truth , df), positive = "1")

    Confusion Matrix and Statistics
    
              truth
    prediction   0   1
             0  50  10
             1  90 100
    
                   Accuracy : 0.6             
                     95% CI : (0.5364, 0.6612)
        No Information Rate : 0.56            
        P-Value [Acc > NIR] : 0.1128          
    
                      Kappa : 0.247           
     Mcnemar's Test P-Value : 2.789e-15       
    
                Sensitivity : 0.9091          
                Specificity : 0.3571          
             Pos Pred Value : 0.5263          
             Neg Pred Value : 0.8333          
                 Prevalence : 0.4400          
             Detection Rate : 0.4000          
       Detection Prevalence : 0.7600          
          Balanced Accuracy : 0.6331          
    
           'Positive' Class : 1