rroccaret

Why does caret produce the error "Something is wrong; all the ROC metric values are missing"?


It seems that many people have had this problem for years. There are numerous questions addressing this issue. I've tried all of the solutions they suggest and none of them have worked for me. It would be nice to know what the underlying issue is here, as the error message has not been helpful.

Something is wrong; all the ROC metric values are missing:

caret - error - Something is wrong - all the ROC metric values are missing:

error in caret ROC metric : "Something is wrong; all the ROC metric values are missing"

Using metric ROC in caret train function in R

Issue using 'ROC' metric in caret train function in R

Here is a reproducible example from my code. I had to cut down the test data, but the error seems the same. The full data set has 44 predictors instead of 8, and 1800 observations instead of 30.

test_data <- structure(list(elevation = c(4L, 4L, 146L, 146L, 146L, 146L, 
146L, 146L, 146L, 146L, 146L, 146L, 146L, 146L, 146L, 146L, 146L, 
146L, 146L, 146L, 146L, 146L, 204L, 291L, 291L, 291L, 291L, 413L, 
413L, 413L), stdev_elevation = c(0L, 0L, 3L, 3L, 3L, 2L, 3L, 
3L, 3L, 2L, 2L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 2L, 2L, 2L, 40L, 
52L, 52L, 52L, 52L, 91L, 91L, 91L), d2coast = c(9L, 8L, 142L, 
142L, 140L, 137L, 139L, 140L, 140L, 135L, 135L, 140L, 135L, 137L, 
135L, 137L, 135L, 135L, 140L, 137L, 134L, 132L, 3L, 10L, 10L, 
10L, 10L, 7L, 7L, 7L), lc_class = structure(c(12L, 12L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
3L, 3L, 9L, 9L, 9L, 9L, 9L, 2L, 2L, 2L), levels = c("Cryptogam barren complex (bedrock)", 
"Cryptogam, herb barren", "Erect dwarf-shrub tundra", "Graminoid, prostrate dwarf-shrub, forb tundra", 
"Low-shrub tundra", "Nontussock sedge, dwarf-shrub, moss tundra", 
"Prostrate dwarf-shrub, herb tundra", "Prostrate/Hemiprostrate dwarf-shrub tundra", 
"Rush/grass, forb, cryptogam tundra", "Sedge, moss, dwarf-shrub wetland", 
"Sedge, moss, low-shrub wetland", "Sedge/grass, moss wetland", 
"Tussock-sedge, dwarf-shrub, moss tundra"), class = "factor"), 
    substrate = c(3L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
    2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 
    3L, 3L, 3L, 3L), elevation2 = c(18L, 18L, 21386L, 21386L, 
    21465L, 21360L, 21427L, 21465L, 21465L, 21430L, 21430L, 21465L, 
    21430L, 21360L, 21430L, 21360L, 21430L, 21430L, 21465L, 21360L, 
    21380L, 21334L, 41625L, 84836L, 84836L, 84836L, 84836L, 170996L, 
    170996L, 170996L), stdev_elevation2 = c(0L, 0L, 13L, 13L, 
    10L, 8L, 10L, 10L, 10L, 7L, 7L, 10L, 7L, 8L, 7L, 8L, 7L, 
    7L, 10L, 8L, 6L, 5L, 1644L, 2723L, 2723L, 2723L, 2723L, 8418L, 
    8418L, 8418L), d2coast2 = c(81L, 77L, 20236L, 20236L, 19753L, 
    18932L, 19479L, 19753L, 19753L, 18449L, 18449L, 19753L, 18449L, 
    18932L, 18449L, 18932L, 18449L, 18449L, 19753L, 18932L, 18064L, 
    17678L, 11L, 100L, 100L, 100L, 100L, 52L, 52L, 52L), presence = c("no", 
    "yes", "no", "no", "no", "no", "no", "no", "no", "no", "no", 
    "no", "no", "no", "no", "no", "no", "no", "no", "no", "no", 
    "no", "no", "no", "no", "no", "no", "no", "no", "no"), Region_Code = c(3L, 
    3L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 
    6L, 6L, 6L, 6L, 6L, 6L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 
    10L)), row.names = c(NA, -30L), class = c("tbl_df", "tbl", 
"data.frame"))

Here is the model I'm having issues with

library(caret)
library(CAST)
library(pROC)


test_data <- as.data.frame(test_data)

indices <- CreateSpacetimeFolds(test_data, spacevar = "Region_Code", k = 3)


pred <- test_data[,1:8]
obs <- test_data[,9] 


##### doesn't work

model1 <- ffs(predictors = pred, 
           response = obs,
           trControl = trainControl(method = 'cv', number = 12, summaryFunction = twoClassSummary, classProbs = TRUE,  savePredictions = TRUE),
           minVar = 2,
           method = 'glm', 
           family = 'binomial', 
           metric = 'ROC',
           index = indices$index)



#[1] "model using elevation,stdev_elevation will be trained now..."
#Something is wrong; all the ROC metric values are missing:
#      ROC           Sens          Spec    
# Min.   : NA   Min.   : NA   Min.   : NA  
# 1st Qu.: NA   1st Qu.: NA   1st Qu.: NA  
# Median : NA   Median : NA   Median : NA  
# Mean   :NaN   Mean   :NaN   Mean   :NaN  
# 3rd Qu.: NA   3rd Qu.: NA   3rd Qu.: NA  
# Max.   : NA   Max.   : NA   Max.   : NA  
# NA's   :1     NA's   :1     NA's   :1    
#Error: Stopping
#In addition: There were 13 warnings (use warnings() to see them)



##### works fine if we remove the index

model2 <- ffs(predictors = pred, 
              response = obs,
              trControl = trainControl(method = 'cv', number = 12, summaryFunction = twoClassSummary, classProbs = TRUE,  savePredictions = TRUE),
              minVar = 2,
              method = 'glm', 
              family = 'binomial', 
              metric = 'ROC')



[1] "model using elevation,stdev_elevation will be trained now..."
[1] "maximum number of models that still need to be trained: 48"
[1] "model using elevation,d2coast will be trained now..."
[1] "maximum number of models that still need to be trained: 47"
[1] "model using elevation,lc_class will be trained now..."
[1] "maximum number of models that still need to be trained: 46"
[1] "model using elevation,substrate will be trained now..."
[1] "maximum number of models that still need to be trained: 45"
[1] "model using elevation,elevation2 will be trained now..."
[1] "maximum number of models that still need to be trained: 44"
[1] "model using elevation,stdev_elevation2 will be trained now..."
[1] "maximum number of models that still need to be trained: 43"
[1] "model using elevation,d2coast2 will be trained now..."


I'm interested in two things:

  1. Why am I getting this error specifically for this model?

  2. What does this error code mean in general? Knowing that the ROC metrics are missing has not helped me or the people in the stackoverflow questions listed above in figuring out what is wrong with their models that is leading to this error. I haven't been able to identify a common theme in all the potential solutions that have been suggested.

#Relevant session info:

> sessionInfo()
R version 4.2.2 (2022-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19045)

Matrix products: default

locale:
[1] LC_COLLATE=English_Canada.utf8  LC_CTYPE=English_Canada.utf8    LC_MONETARY=English_Canada.utf8 LC_NUMERIC=C                    LC_TIME=English_Canada.utf8    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] pROC_1.18.0     CAST_0.7.1      caret_6.0-93    lattice_0.20-45 ggplot2_3.4.1  

loaded via a namespace (and not attached):
 [1] tidyselect_1.2.0     terra_1.7-3          purrr_1.0.1          reshape2_1.4.4       listenv_0.9.0        splines_4.2.2        colorspace_2.1-0    
 [8] vctrs_0.5.2          generics_0.1.3       stats4_4.2.2         utf8_1.2.3           survival_3.4-0       prodlim_2019.11.13   rlang_1.0.6         
[15] ModelMetrics_1.2.2.2 pillar_1.8.1         glue_1.6.2           withr_2.5.0          foreach_1.5.2        lifecycle_1.0.3      plyr_1.8.8          
[22] lava_1.7.2.1         stringr_1.5.0        timeDate_4022.108    munsell_0.5.0        gtable_0.3.1         future_1.31.0        recipes_1.0.5       
[29] codetools_0.2-18     parallel_4.2.2       class_7.3-20         fansi_1.0.4          Rcpp_1.0.10          scales_1.2.1         ipred_0.9-13        
[36] parallelly_1.34.0    digest_0.6.31        stringi_1.7.12       dplyr_1.1.0          grid_4.2.2           hardhat_1.2.0        cli_3.6.0           
[43] tools_4.2.2          magrittr_2.0.3       tibble_3.1.8         future.apply_1.10.0  pkgconfig_2.0.3      MASS_7.3-58.1        Matrix_1.5-3        
[50] data.table_1.14.8    lubridate_1.9.2      timechange_0.2.0     gower_1.0.1          rstudioapi_0.14      iterators_1.0.14     R6_2.5.1            
[57] globals_0.16.2       rpart_4.1.19         nnet_7.3-18          nlme_3.1-160         compiler_4.2.2 



Solution

  • Turns out I made a mistake in the syntax. The argument index should be part of trControl:

    model1 <- ffs(predictors = pred, 
               response = obs,
               trControl = trainControl(method = 'cv', number = 12, summaryFunction = twoClassSummary, classProbs = TRUE,  savePredictions = TRUE, index = indices$index),
               minVar = 2,
               method = 'glm', 
               family = 'binomial', 
               metric = 'ROC')
    
    

    It does seem that this is quite an unhelpful error message. All of the questions linked above had very unrelated issues. This appears to be another example.