I would like to use a CatBoost regressor for insurance applications (Poisson objective). As I need to fix the exposure, how can I set the offset of log_exposure? When using xgboost I use "base_margin", while for lightgbm I use the "init_score" params. Is there an equivalent in CatBoost?
After looking on the documentation, I found a viable solution. The fit method of both the CatBoostRegressor
and CatboostClassifier
provides a baseline
and a sample_weight
parameter that can be directly use to set an offset (for prior exposure) or a sample weight (for severity modeling).
Btw, the optimal approach is to create Pool
s and providing there the specification of offset and weights:
freq_train_pool = Pool(data=freq_train_ds, label=claim_nmb_train.values,cat_features=xvars_cat,baseline=claim_model_offset_train.values)
freq_valid_pool = Pool(data=freq_valid_ds, label=claim_nmb_valid.values,cat_features=xvars_cat,baseline=claim_model_offset_valid.values)
freq_test_pool = Pool(data=freq_test_ds, label=claim_nmb_test.values,cat_features=xvars_cat,baseline=claim_model_offset_test.values)
Here the data
parameters contain pd.DataFrame
with the predictors only, the label
one che actual number of claim, cat_features
are character lists specifying the categorical terms and the baseline
terms are the np.array of log exposure. It works.
Using Pools allows to provide evaluation sets in the fit method.