machine-learningtreexgboostr-caret

Non-tree model error when using xgbTree method with Caret and weights to target variable when applying the varImp function


When I create an model with the 'train' function from the Caret package to do a gradient boosting with weights, I get an error when using the 'varImp' function that says it didn't detected a tree model. But when I remove the weights it works.

Thee code below produces the error:

set.seed(123)

model_weights <- ifelse(modelo_df_sseg$FATALIDADES == 1,
                        yes = (1/table(modelo_df_sseg$FATALIDADES)[2]) * 0.5,
                        no = (1/table(modelo_df_sseg$FATALIDADES)[1]) * 0.5
                        )

model <- train(
  as.factor(FATALIDADES) ~.,
  data = modelo_df_sseg, 
  method = "xgbTree",
  trControl = trainControl("cv", number = 10),
  weights = model_weights
  )

varImp(model)

But if I don't apply weights it works.

Why varImp doens't recognizes my tree?

EDIT 04-SEP-2020

It was suggested by missuse in the comments section to use wts instead of weights. Now I get the error below:

Error in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, : formal argument 'wts' matched by multiple actual arguments

I made a small code with an R in-built dataset so you can test it yourself:

set.seed(123)

basex <- Arrests

model_weights <- ifelse(basex$released == 2,
                        yes = (1/table(basex$released)[2]) * 0.5,
                        no = (1/table(basex$released)[1]) * 0.5
                        )

y = basex$released
x = basex
tc = trainControl("cv", number = 10)

mtd = "xgbTree"
model <- train(
  x, 
  y, 
  method = mtd,
  trControl = tc, 
  wts = model_weights,
  verbose = TRUE
  )

Maybe I'm creating the weights vector wrong. But I can't find any documentation on the 'wts' parameter.


Solution

  • The example code has several problems.

    The correct way to apply weights in caret is using the weights argument to train.

    I was mistaken in the comments where I recommended to use the argument wts. My error was due to the xgbTree source, specifically the line:

    if (!is.null(wts))
        xgboost::setinfo(x, 'weight', wts)
    

    which indicates wts might be the correct answer.

    Lets go through the example and fix all problems

    library(caret)
    library(car) #for the data set
    library(tidyverse) #because I like to use it
    
    data(Arrests)
    basex <- Arrests
    
    
    table(basex$released) #released is the outcome class
    
      No  Yes 
     892 4334 
    

    Here we see "Yes" outcome is much more frequent then "No" outcome. This will skew the predicted probabilities and favor a model which will tend to predict "Yes". One way to fix it is to give higher weight to the "No" observations. A meaningful weight for the "No" observations would be the proportion of the "Yes" class, and a meaningful weight for the "Yes" observations would be the proportion of the "No" class:

    model_weights <- ifelse(basex$released == "Yes",
                            table(basex$released)[1]/nrow(basex),
                            table(basex$released)[2]/nrow(basex))
    

    The sum of the weights is 1

    head(data.frame(basex,
                    weights = model_weights))
      released colour year age    sex employed citizen checks  weights
    1      Yes  White 2002  21   Male      Yes     Yes      3 0.170685
    2       No  Black 1999  17   Male      Yes     Yes      3 0.829315
    3      Yes  White 2000  24   Male      Yes     Yes      3 0.170685
    4       No  Black 2000  46   Male      Yes     Yes      1 0.829315
    5      Yes  Black 1999  27 Female      Yes     Yes      1 0.170685
    6      Yes  Black 1998  16 Female      Yes     Yes      0 0.170685
    

    "Yes" is more frequent so we give it a lesser weight.

    From the above it can be seen the data frame has several categorical predictors (like colour, sex...). xgbTree can not handle them so you will need to convert them to numeric prior to modeling. One way to convert categorical predictors to numeric is dummy coding. There are other ways but that is not within the scope of this answer.

    To use dummy coding:

    dummies <- dummyVars(released ~ ., data = basex)
    x <- predict(dummies, newdata = basex)
    head(x)
    colour.Black colour.White year age sex.Female sex.Male employed.No employed.Yes citizen.No citizen.Yes checks
    1            0            1 2002  21          0        1           0            1          0           1      3
    2            1            0 1999  17          0        1           0            1          0           1      3
    3            0            1 2000  24          0        1           0            1          0           1      3
    4            1            0 2000  46          0        1           0            1          0           1      1
    5            1            0 1999  27          1        0           0            1          0           1      1
    6            1            0 1998  16          1        0           0            1          0           1      0
    
    y <- basex$released
    

    Now we have our weights, x and y

    Since I will fit several models below I will first create the resampling folds and use them within each call to train, so they don't differ.

    folds <- createFolds(basex$released, 10)
    

    Since there is a disbalance in the class frequencies I will use twoClassSummary so we can see the sensitivity and specificity of the trained models

    tc <- trainControl(method = "cv",
                       number = 10,
                       summaryFunction = twoClassSummary,
                       index = folds, #predefined folds
                       classProbs = TRUE) #needed for twoClassSummary
    
    mtd <- "xgbTree"
    
    model <- train(x = x, 
                   y = y, 
                   method = mtd,
                   trControl = tc, 
                   weights = model_weights,
                   verbose = TRUE,
                   metric = "ROC")
    

    #no errors

    model$results %>%
      filter(ROC == max(ROC))
      eta max_depth gamma colsample_bytree min_child_weight subsample nrounds       ROC      Sens     Spec       ROCSD     SensSD     SpecSD
    1 0.3         1     0              0.8                1         1      50 0.7031076 0.6185944 0.693945 0.009074758 0.03516597 0.01536701
    

    Here we see that if we use the model weights the model with the highest AUC has 0.6185944 sensitivity and 0.693945 specificity.

    Without the weights

    model2 <- train(x = x, 
                   y = y, 
                   method = mtd,
                   trControl = tc, 
                   verbose = TRUE,
                   metric = "ROC")
    

    #no errors

    model2$results %>%
      filter(ROC == max(ROC))
      eta max_depth gamma colsample_bytree min_child_weight subsample nrounds      ROC      Sens      Spec     ROCSD     SensSD     SpecSD
    1 0.3         1     0              0.8                1      0.75      50 0.701109 0.1000325 0.9713885 0.0101395 0.03343579 0.01236701
    

    A model without the weights has sensitivity of 0.1000325 and specificity of 0.9713885.

    So the meaningful weights argument fixed the model tendency to predict "Yes" all the time.