Trying to better understand how train(tuneLength = )
works in {caret}
. My confusion happened when trying to understand some of the differences between the SVM methods from {kernlab}
I've reviewed the documentation (here) and the caret training page (here).
My toy example was creating five models using the iris
dataset. Results are here, and reproducible code is here (they're rather long so I didn't copy and paste them into the post).
From the {caret}
documentation:
tuneLength
an integer denoting the amount of granularity in the tuning parameter grid. By default, this argument is the number of levels for each tuning parameters that should be generated by train. If trainControl has the option search = "random", this is the maximum number of tuning parameter combinations that will be generated by the random search. (NOTE: If given, this argument must be named.)
In this example, trainControl(search = "random")
and train(tuneLength = 30)
, but there appears to be 67 results, not 30 (the maximum number of tuning parameter combinations)? I tried playing around to see if maybe there were 30 unique ROC
values, or even ydim
values, but by my count they're not.
For the toy example, I created the following table:
Is there a way to see what's going on "under the hood"? For instance, M1
(svmRadial
) and M3
(svmRadialSigma
) both take, and are given, the same tune parameters, but based on calling $results
appear to use them differently?
My understanding of train(tuneLength = 9)
was that both models would produce results of sigma
and C
each with 9 values, 9 times
since 9
is the number of levels for each tuning parameter (the exception being random search)? Similarly, M4
would be 9^3
since train(tuneLength = 9)
and there are 3
tuning parameters?
I need to update the package documentation more but the details are spelled on on the package web page for random search:
The total number of unique combinations is specified by the
tuneLength
option totrain
.
However, this is particularly muddy for SVMs using the RBF kernel. Here is a run down:
svmRadial
tunes over cost and uses a single value of sigma
based on kernlab
's sigest
function. For grid search, tuneLength
is the number of cost values to test and for random search it is the total number of (cost, sigma
) pairs to evaluate.svmRadialCost
is the same as svmRadial
but sigest
is run inside of each resampling loop. For random, search, it does not tune over sigma
.svmRadialSigma
with grid search tunes over both cost and sigma
. In a moment of sub-optimal cognitive performance, I set this up to try at most 6 values of sigma
during grid search since I felt that cost space needed a wider range. For random search it does the same as svmRadial
.svmRadialWeight
is the same as svmRadial
but also considered class weights and is for 2-class problems only.As for the SOM example on the webpage, well that's a bug. I over-sample the SOM parameter space since there needs to be a filter for xdim <= ydim & xdim*ydim < nrow(x)
. The bug is from me not keeping the right amount of parameters.