I am trying to use k-fold cross-validation on a WESBROOK dataset. It uses the train
function from the caret
package to do this. So far this function has worked for me with methods such as svm
, knn
and rpart
, however with the nb
(naive bayes) method, I get the following error:
Error in { :
task 1 failed - "Not all variable names used in object found in newdata"
This is how my train
function look like:
k_folds <- 5
train_control <- trainControl(method = "cv", number = k_folds, classProbs = TRUE, summaryFunction = twoClassSummary)
nb_model <- train(
TOTLGIVE ~ ., data = train_data,
method = "nb",
trControl = train_control
)
I checked, there are no missing data, the column names and their types are the same in the training and test set.
Check whether the levels of factor variables are the same:
lapply(train_data, levels)
lapply(test_data, levels)
In general, a good practice after performing changes on columns is to provide common names for the training set and the test set. This can prevent such problems.
library(dplyr)
test_data <- test_data %>%
select(intersect(names(train_data), names(test_data)))
If this error persists, try using an alternative naive_bayes
method from the naivebayes
package.
nb_model <- train(
TOTLGIVE ~ ., data = train_data,
method = "naive_bayes",
trControl = train_control
)
Note: This method can support other tuning parameters. Read more about the various train
function methods here: https://topepo.github.io/caret/train-models-by-tag.html