
Using weigths in model for unbalanced data

I'd like to know if you have any way to control catboost weights in the function of unbalanced sample size. In my case I have a dataset of 2 areas "a" and "b"(x_categorical_1), area 1 size 4 values (is small just for example), area 2 with 3 values. I don't like to make several bootstraps with size 3, but create some weights in the model considering the sample size by area. Is this possible in catboost?

In my example:

# R language
# Package: catboost
# See: https://github.com/catboost/catboost/tree/master

# X data (predictors)
x_train <- data.frame(x_numeric_1=c(1,2,3,4,5,6,7),

# Here a I'd like to given weights in function of   x_categorical_1 size, "a" = 4 values (7/4=0.6) and b = 3 values (7/3=0.4):

# w data (observation weights)
w_train <- c(0.6,0.6,0.6,0.6,0.4,0.4,0.4)

# y data (target)
y_train <- c(1,3,1,4,1,5,1)

# Create the model

x_train_learn_pool <-catboost::catboost.load_pool(x_train[1:2])
# Fit the model
model.f <- catboost.train(data = x_train_learn_pool,
    label = y_train,
    weight = w_train

# Fit model
test_predictions <- catboost::catboost.predict(model.f,x_train_learn_pool)
Error in catboost.train(data = x_train_learn_pool, label = y_train, weight = w_train) : 
  unused arguments (data = x_train_learn_pool, label = y_train, weight = w_train)

But, when I try to use weight my model doesn't work. Please, any help with it?


  • There are a few errors in your code which causes 'unused arguments' error.

    Below is a code example of how it should be:

    # Create the model
    x_train_learn_pool <- catboost::catboost.load_pool(x_train[1:2],label = y_train, weight = w_train)
    # Fit the model
    model.f <- catboost.train(learn_pool = x_train_learn_pool)
    # Fit model
    test_predictions <- catboost::catboost.predict(model.f,x_train_learn_pool)