I'm using caret
to compare models for a classification problem with nested CV. Vfold in the outer loop and bootstrap (500 replicates) in the inner loop. I get this error after training knn:
Warning: There were missing values in resampled performance measures.
Which I believe comes from the fact that some resamples have zero items of the class of interest in the holdout sample, yielding NA
for Sensitivity and ROC. My question is: Is there any way to ensure that items from this class are present in every bootstrap resample? Kind of what the CreateDataPartition
function does (I believe this is also called stratified bootstrap?).
If not, how should we proceed with this? (In terms of comparing model performance on the same resamples)
Thanks!
So I couldn't find a way to do this within caret
but here is a workaround using rsample
package. The point is to compute the resamples before and feed this information to trainControl
function via index
and indexOut
arguments, previous conversion to caret
format.
indices=bootstraps(train,times=50,strata="class_of_interest")
indices=rsample2caret(indices)
train_control <- trainControl(method="boot",number=50,index=indices$index,indexOut = indices$indexOut)
Hope this helps.