I'd like to know if you have any way to control catboost weights in the function of unbalanced sample size. In my case I have a dataset of 2 areas "a" and "b"(x_categorical_1
), area 1 size 4 values (is small just for example), area 2 with 3 values. I don't like to make several bootstraps with size 3, but create some weights in the model considering the sample size by area. Is this possible in catboost?
In my example:
# R language
# Package: catboost
# See: https://github.com/catboost/catboost/tree/master
library(catboost)
# X data (predictors)
x_train <- data.frame(x_numeric_1=c(1,2,3,4,5,6,7),
x_numeric_2=c(1,3,5,1,3,5,1),
x_categorical_1=c("a","a","a","a","b","b","b"))
# Here a I'd like to given weights in function of x_categorical_1 size, "a" = 4 values (7/4=0.6) and b = 3 values (7/3=0.4):
# w data (observation weights)
w_train <- c(0.6,0.6,0.6,0.6,0.4,0.4,0.4)
# y data (target)
y_train <- c(1,3,1,4,1,5,1)
# Create the model
x_train_learn_pool <-catboost::catboost.load_pool(x_train[1:2])
# Fit the model
model.f <- catboost.train(data = x_train_learn_pool,
label = y_train,
weight = w_train
)
#
# Fit model
test_predictions <- catboost::catboost.predict(model.f,x_train_learn_pool)
test_predictions
#
Error in catboost.train(data = x_train_learn_pool, label = y_train, weight = w_train) :
unused arguments (data = x_train_learn_pool, label = y_train, weight = w_train)
But, when I try to use weight
my model doesn't work. Please, any help with it?
There are a few errors in your code which causes 'unused arguments' error.
catboost.train
there are no such arguments as data, label or weight.catboost.load_pool
function.catboost.load_pool
.Below is a code example of how it should be:
# Create the model
x_train_learn_pool <- catboost::catboost.load_pool(x_train[1:2],label = y_train, weight = w_train)
# Fit the model
model.f <- catboost.train(learn_pool = x_train_learn_pool)
# Fit model
test_predictions <- catboost::catboost.predict(model.f,x_train_learn_pool)
test_predictions