rsvmr-caretbioconductorkernlab

caret function 'train' failing for bagged svm


I am using bioconductor package MLSeq on Ubuntu with R version 3.1.2 . I have tried running through the example provided by the package, and that work just fine. However, I want to use the bagsvm method for the classify function, so at chunk 14, I changed the code from

svm <- classify(data = data.trainS4, method = "svm", normalize = "deseq",
               deseqTransform = "vst", cv = 5, rpt = 3, ref = "T") 

to

 bagsvm <- classify(data = data.trainS4, method = "bagsvm", normalize = "deseq",
               deseqTransform = "vst", cv = 5, rpt = 3, ref = "T")

which produced the error:

Something is wrong; all the Accuracy metric values are missing:
    Accuracy       Kappa   
 Min.   : NA   Min.   : NA 
 1st Qu.: NA   1st Qu.: NA 
 Median : NA   Median : NA 
 Mean   :NaN   Mean   :NaN 
 3rd Qu.: NA   3rd Qu.: NA 
 Max.   : NA   Max.   : NA 
 NA's   :1     NA's   :1   
Error in train.default(counts, conditions, method = "bag", B = B, bagControl = bagControl(fit = svmBag$fit,  :
  Stopping
In addition: There were 17 warnings (use warnings() to see them)

The warnings were:

 Warning messages:
1: executing %dopar% sequentially: no parallel backend registered
2: In eval(expr, envir, enclos) :
  model fit failed for Fold1.Rep1: vars=150 Error in fitter(btSamples[[iter]], x = x, y = y, ctrl = bagControl, v = vars,  :
  task 1 failed - "could not find function "lev""

warning 2 was then repeated 14 times followed by:

17: In nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo,  ... :
  There were missing values in resampled performance measures.

traceback() produced

4: stop("Stopping")
3: train.default(counts, conditions, method = "bag", B = B, bagControl = bagControl(fit = svmBag$fit, 
       predict = svmBag$pred, aggregate = svmBag$aggregate), trControl = ctrl, 
       ...)
2: train(counts, conditions, method = "bag", B = B, bagControl = bagControl(fit = svmBag$fit, 
       predict = svmBag$pred, aggregate = svmBag$aggregate), trControl = ctrl, 
       ...)
1: classify(data = data.trainS4, method = "bagsvm", normalize = "deseq", 
       deseqTransform = "vst", cv = 5, rpt = 3, ref = "T")

I thought the problem might have been that the kernlab library, which I think MLSeq code uses, didn't get loaded so I tried

library(kernlab)
bagsvm <- classify(data = data.trainS4, method = "bagsvm", normalize = "deseq",
               deseqTransform = "vst", cv = 5, rpt = 3, ref = "T")

which resulted in the same error, but the warnings changed to:

Warning messages:
    1: In eval(expr, envir, enclos) :
      model fit failed for Fold1.Rep1: vars=150 Error in fitter(btSamples[[iter]], x = x, y = y, ctrl = bagControl, v = vars,  :
      task 1 failed - "no applicable method for 'predict' applied to an object of class "c('ksvm', 'vm')""

repeated 15 times followed by

16: In nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo,  ... :
  There were missing values in resampled performance measures.

I don't believe this problem is specific to MLSeq as I tried running the train function as

ctrl <- trainControl(method = "repeatedcv", number = 5, 
    repeats = 3)
train <- train(counts, conditions, method = "bag", B = 100, 
           bagControl = bagControl(fit = svmBag$fit, predict = svmBag$pred, 
                                   aggregate = svmBag$aggregate), trControl = ctrl)

where counts is a data frame with the RNASeq data and conditions is a factor with the classes and I got the exact same results. Any help is much appreciated.


Solution

  • I was trying to debug my problem, and seem to have inadvertently found a solution. Since the problem seemed to be in the predict function so I stored the svmBag$pred function as a variable predfunct so I could see where it was not working

    predfunct<-function (object, x)
    {
     if (is.character(lev(object))) {
        out <- predict(object, as.matrix(x), type = "probabilities")
        colnames(out) <- lev(object)
        rownames(out) <- NULL
      }
      else out <- predict(object, as.matrix(x))[, 1]
      out
    }
    

    and then calling

    train <- train(counts, conditions, method = "bag", B = 100, 
           bagControl = bagControl(fit = svmBag$fit, predict = predfunct, 
                                   aggregate = svmBag$aggregate), trControl = ctrl)
    

    as in the last code block of the problem description with predfunct replacing svmBag$pred. Somehow this fixed the problem and everything runs just fine. If anyone can figure out why this worked, and preferably find a solution that isn't such a kluge, I will make your response the answer.