rmachine-learningsvmr-caret

"Something is wrong; all the Accuracy metric values are missing:"


I took the following code out of a textbook, "Machine Learning With R" by Brett Lantz, however copied exactly the same to the console from the textbook,

> library(caret)
Loading required package: lattice
Loading required package: ggplot2
> library(kernlab)

Attaching package: ‘kernlab’

The following object is masked from ‘package:ggplot2’:

alpha

> set.seed(300)
> ctrl <- trainControl(method = "cv", number = 10)
> bagctrl <- bagControl(fit = svmBag$fit, predict = svmBag$pred, aggregate = svmBag$aggregate)
> setwd("~/2148OS_code/chapter 11")
> credit <- read.csv("credit.csv")
> svmbag <- train(default ~ ., data = credit, "bag", trControl = ctrl, bagControl = bagctrl)

I get this response. Whats wrong?

Something is wrong; all the Accuracy metric values are missing:
    Accuracy       Kappa    
 Min.   : NA   Min.   : NA  
 1st Qu.: NA   1st Qu.: NA  
 Median : NA   Median : NA  
 Mean   :NaN   Mean   :NaN  
 3rd Qu.: NA   3rd Qu.: NA  
 Max.   : NA   Max.   : NA  
 NA's   :1     NA's   :1    
Error in train.default(x, y, weights = w, ...) : Stopping
In addition: There were 50 or more warnings (use warnings() to see the first 50)

The warnings are

> warnings()
Warning messages:
1: In data.row.names(row.names, rowsi, i) :
  some row.names duplicated: 3,6,10,13,17,19,23,24,26,27,30,32,34,36,38,41,42,45,49,54,59,60,61,64,66,69,71,72,77,80,81,90,95,102,103,106,112,114,117,118,122,125,127,132,133,137,139,141,143,146,148,151,152,155,158,161,174,176,178,181,185,187,188,189,191,194,203,208,210,212,215,216,218,219,221,223,225,229,230,235,236,239,245,246,262,266,269,271,272,276,279,282,283,285,286,287,288,296,299,305,308,309,313,314,315,317,318,319,322,323,327,328,330,332,333,335,336,338,339,343,347,349,350,352,354,358,360,361,363,366,367,368,369,371,377,379,387,389,392,394,396,397,399,400,410,412,413,414,421,425,428,437,438,441,443,445,446,448,451,453,461,467,469,471,479,481,482,484,486,487,489,491,493,503,504,506,508,511,512,514,517,519,521,522,524,529,530,532,534,537,538,545,547,550,552,555,562,570,579,582,584,588,589,590,601,606,608,610,611,614,615,618,619,623,627,628,629,630,632,634,636,638,641,642,645,653,656,659,660,661,663,667,669,672,673,676,679,681,686,687,690,693,700,701,702,707,708,721,722,724,725,728, [... truncated]
2: In data.row.names(row.names, rowsi, i) :
  some row.names duplicated: 3,5,8,9,13,15,18,21,25,27,29,33,36,37,41,44,45,51,52,53,55,59,60,64,66,67,72,76,77,80,91,92,96,97,102,103,104,107,110,111,113,116,119,121,122,123,127,130,133,136,139,140,143,145,147,148,149,154,158,160,164,166,168,169,171,174,176,177,178,180,182,185,187,195,199,203,205,216,218,220,223,226,231,234,236,237,238,242,245,2

Solution

  • I used the code provided by packed for the second edition.

    If you setup parallel processing the warnings will disappear. You will still end up with the error of the missing accuracy metrics.

    This error is caused by the fact that there were missing values in resampled performance measures. That might happen if there is a resample where one of the outcome classes (in this case default) has zero samples so sensitivity or specificity is undefined.

    I also ran a test with the GermanCredit data included in the caret package and this generates the same error.