rcross-validationr-caretnaivebayes

Cross-validation with nb method


I am trying to use k-fold cross-validation on a WESBROOK dataset. It uses the train function from the caret package to do this. So far this function has worked for me with methods such as svm, knn and rpart, however with the nb (naive bayes) method, I get the following error:

Error in { : 
  task 1 failed - "Not all variable names used in object found in newdata"

This is how my train function look like:

k_folds <- 5
train_control <- trainControl(method = "cv", number = k_folds, classProbs = TRUE, summaryFunction = twoClassSummary)

nb_model <- train(
  TOTLGIVE ~ ., data = train_data,
  method = "nb",
  trControl = train_control
)

I checked, there are no missing data, the column names and their types are the same in the training and test set.


Solution

  • Check whether the levels of factor variables are the same:

    lapply(train_data, levels)
    lapply(test_data, levels)
    

    In general, a good practice after performing changes on columns is to provide common names for the training set and the test set. This can prevent such problems.

    library(dplyr)
    test_data <- test_data %>%
      select(intersect(names(train_data), names(test_data)))
    

    If this error persists, try using an alternative naive_bayes method from the naivebayes package.

    nb_model <- train(
      TOTLGIVE ~ ., data = train_data,
      method = "naive_bayes",
      trControl = train_control
    )
    

    Note: This method can support other tuning parameters. Read more about the various train function methods here: https://topepo.github.io/caret/train-models-by-tag.html