rrstudior-caret

Is there any work-around to get train() function of caret package work from within Rstudio?


I was walking through the examples of the very nice book "Applied Predictive Modeling" by Max Kuhn and Kjell Johnson, unfortunately I got stuck in one of the examples using the train() function and one of the GermanCredit dataset provided by the caret package for cross-validation of Support Vector Machines:

library(AppliedPredictiveModeling)
library(caret)
# preparing the data
data(GermanCredit)
GermanCredit <- GermanCredit[, -nearZeroVar(GermanCredit)]
GermanCredit$CheckingAccountStatus.lt.0 <- NULL
GermanCredit$SavingsAccountBonds.lt.100 <- NULL
GermanCredit$EmploymentDuration.lt.1 <- NULL
GermanCredit$EmploymentDuration.Unemployed <- NULL
GermanCredit$Personal.Male.Married.Widowed <- NULL
GermanCredit$Property.Unknown <- NULL
GermanCredit$Housing.ForFree <- NULL
set.seed(100)
inTrain <- createDataPartition(GermanCredit$Class, p = .8)[[1]]
GermanCreditTrain <- GermanCredit[ inTrain, ]
GermanCreditTest  <- GermanCredit[-inTrain, ]

# Grid selection for `sigma` and `cost` tuning parameters:    
library(kernlab)
set.seed(231)
sigDist <- sigest(Class ~ ., data = GermanCreditTrain, frac = 1)
svmTuneGrid <- data.frame(.sigma = sigDist[1], .C = 2^(-2:7))

# SVM classification and cross-validation
svmFit <- train(Class ~ .,
                data = GermanCreditTrain,
                method = "svmRadial",
                preProc = c("center", "scale"),
                tuneGrid = svmTuneGrid,
                trControl = trainControl(method = "repeatedcv", repeats = 5, 
                                         classProbs =  TRUE))  

and it has thrown this error:

Error in comp(expr, env = envir, options = list(suppressUndefined = TRUE)) : 
  could not find function "makeCenv"

sometimes this error message:

Loading required package: class
Warning: namespace ‘compiler’ is not available and has been replaced
by .GlobalEnv when processing object ‘GermanCredit’
Error in comp(expr, env = envir, options = list(suppressUndefined = TRUE)) : 
  could not find function "makeCenv"
In addition: Warning message:
executing %dopar% sequentially: no parallel backend registered

Then I learned that makeCenv() is in the doMC package that was suggested as alternative for parallel computation or parallel processing, but I wouldn't go for this package since it is not available in Windows platform, I guess. Any alternative?

Update: These errors appeared only when the code was run under Rstudio IDE, things were fine from the default R console, so the problem is local to Rstudio, I guess. The time was a little bit long in R console (about 8min), though, I wonder how to speed up things given the hardware specs mentioned below.

My sessioninfo() output is here (Rstudio):

R version 3.0.2 (2013-09-25)
Platform: i386-w64-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] datasets  grid      splines   utils     stats     graphics  grDevices methods  
[9] base     

other attached packages:
 [1] proxy_0.4-10                    e1071_1.6-1                    
 [3] class_7.3-9                     kernlab_0.9-19                 
 [5] caret_5.17-7                    foreach_1.4.1                  
 [7] AppliedPredictiveModeling_1.1-4 CORElearn_0.9.42               
 [9] rpart_4.1-3                     xtable_1.7-1                   
[11] knitr_1.5                       texreg_1.30                    
[13] pastecs_1.3-15                  boot_1.3-9                     
[15] gridExtra_0.9.1                 reshape2_1.2.2                 
[17] plyr_1.8                        scales_0.2.3                   
[19] ggplot2_0.9.3.1                 vcdExtra_0.5-11                
[21] gnm_1.0-6                       vcd_1.3-1                      
[23] corrplot_0.73                   RColorBrewer_1.0-5             
[25] car_2.0-19                      Hmisc_3.13-0                   
[27] Formula_1.1-1                   cluster_1.14.4                 
[29] xlsx_0.5.5                      xlsxjars_0.5.0                 
[31] rJava_0.9-5                     lmPerm_1.1-2                   
[33] coin_1.0-23                     survival_2.37-4                
[35] GPArotation_2012.3-1            psych_1.3.12                   
[37] sos_1.3-8                       brew_1.0-6                     
[39] data.table_1.8.10               mice_2.18                      
[41] nnet_7.3-7                      MASS_7.3-29                    
[43] lattice_0.20-23                

loaded via a namespace (and not attached):
 [1] codetools_0.2-8   colorspace_1.2-4  dichromat_2.0-0   digest_0.6.4     
 [5] evaluate_0.5.1    formatR_0.10      gtable_0.1.2      iterators_1.0.6  
 [9] labeling_0.2      Matrix_1.1-0      modeltools_0.2-21 munsell_0.4.2    
[13] mvtnorm_0.9-9996  proto_0.3-10      qvcalc_0.8-8      relimp_1.0-3     
[17] stats4_3.0.2      stringr_0.6.2     tcltk_3.0.2       tools_3.0.2      

sessionInfo() output from default R console:

R version 3.0.2 (2013-09-25)
Platform: i386-w64-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=English_United States.1252 
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] datasets  grDevices grid      splines   graphics  utils     stats    
[8] methods   base     

other attached packages:
 [1] e1071_1.6-1     class_7.3-9     kernlab_0.9-19  caret_5.17-7   
 [5] foreach_1.4.1   cluster_1.14.4  lattice_0.20-23 reshape2_1.2.2 
 [9] plyr_1.8        scales_0.2.3    ggplot2_0.9.3.1 lmPerm_1.1-2   
[13] coin_1.0-23     survival_2.37-4 sos_1.3-8       brew_1.0-6     

loaded via a namespace (and not attached):
 [1] codetools_0.2-8    colorspace_1.2-4   compiler_3.0.2     dichromat_2.0-0   
 [5] digest_0.6.3       gtable_0.1.2       iterators_1.0.6    labeling_0.2      
 [9] MASS_7.3-29        modeltools_0.2-21  munsell_0.4.2      mvtnorm_0.9-9996  
[13] proto_0.3-10       RColorBrewer_1.0-5 stats4_3.0.2       stringr_0.6.2     
[17] tools_3.0.2       

Questions:

  1. There must be an interaction with Rstudio since it worked well in the default R console, based on the two sessionInfo() outputs of default R console and Rstudio, the difference was compiler package. Strange, this pkg cannot be found in CRAN, I found a note here: http://www.inside-r.org/r-doc/compiler/compile saying that load(compiler) would be enough, when I did this in Rstudio: it was not possible with this error message:

    Error: package ‘compiler’ was built before R 3.0.0: please re-install it

Update
It worked finally from withing Rstudio after copy & paste the compiler package library from that of default R lib path to that of Rstudio lib path, but still the time is too long (about 8min), I would post a separate question of parallel processing given the hardware below and windows if that would help to find an answer sooner.

  1. My laptop is 2.1GHz dual core processor, 3GB, windows 32bit, any idea how to do parallel processing with train() function? can you pls issue the R code for this, I would be very grateful indeed.

Solution

  • The caret code base is completely independent of doMC or any of the other "do" packages. I don't have a windows system to test with here, but I am 99% sure that this is not a reproducible problem. The package is tested nightly in several places (e.g. R-Forge) and across 3-4 different OS's, including Windows. I have never seen this issue come up, even when I've taught classes on the package to large audiences exclusively using Windows.

    My guess is that you accidentally called a doMC function somewhere (even though it was not listed in your sessionInfo out).

    It would be helpful if someone else can try to reproduce this error.

    Thanks,

    Max