I am probably making a very simple (and stupid) mistake here but I cannot figure it out. I am playing with some data from Kaggle (Digit Recognizer) and trying to use SVM with the Caret package to do some classification. If I just plug the label values into the function as type numeric, the train
function in Caret seems to default to regression and performance is quite poor. So what I tried next is to convert it to a factor with the function factor()
and try and run SVM classification. Here is some code where I generate some dummy data and then plug it into Caret:
library(caret)
library(doMC)
registerDoMC(cores = 4)
ytrain <- factor(sample(0:9, 1000, replace=TRUE))
xtrain <- matrix(runif(252 * 1000,0 , 255), 1000, 252)
preProcValues <- preProcess(xtrain, method = c("center", "scale"))
transformerdxtrain <- predict(preProcValues, xtrain)
fitControl <- trainControl(method = "repeatedcv", number = 10, repeats = 10)
svmFit <- train(transformerdxtrain[1:10,], ytrain[1:10], method = "svmradial")
I get this error:
Error in kernelMult(kernelf(object), newdata, xmatrix(object)[[p]], coef(object)[[p]]) :
dims [product 20] do not match the length of object [0]
In addition: Warning messages:
1: In train.default(transformerdxtrain[1:10, ], ytrain[1:10], method = "svmradial") :
At least one of the class levels are not valid R variables names; This may cause errors if class probabilities are generated because the variables names will be converted to: X0, X1, X2, X3, X4, X5, X6, X7, X8, X9
2: In nominalTrainWorkflow(dat = trainData, info = trainInfo, method = method, :
There were missing values in resampled performance measures.
Can somebody tell me what I am doing wrong? Thank you!
You have 10 different classes and yet you are only including 10 cases in train()
. This means that when you resample you will frequently not have all 10 classes in individual instances of your classifier. train()
is having difficulty combining the results of these varying-category SVMs.
You can fix this by some combination of increasing the number of cases, decreasing the number of classes, or even using a different classifier.