Having recently started using ClearML to manage the MLOps, I am facing the following problem: When running a script that trains a CatBoost in a binary classification problem using different class weights from my computer, it works perfectly, logs the results and no issues at all. Once I try to run that remotely using the ClearML agent, it results in the following error:
<!-- language: lang-none -->
Traceback (most recent call last):
File "/root/.clearml/venvs-builds/3.9/lib/python3.9/site-packages/clearml/binding/frameworks/catboost_bind.py", line 102, in _fit
return original_fn(obj, *args, **kwargs)
File "/root/.clearml/venvs-builds/3.9/lib/python3.9/site-packages/catboost/core.py", line 5007, in fit
self._fit(X, y, cat_features, text_features, embedding_features, None, sample_weight, None, None, None, None, baseline, use_best_model,
File "/root/.clearml/venvs-builds/3.9/lib/python3.9/site-packages/catboost/core.py", line 2262, in _fit
train_params = self._prepare_train_params(
File "/root/.clearml/venvs-builds/3.9/lib/python3.9/site-packages/catboost/core.py", line 2194, in _prepare_train_params
_check_train_params(params)
File "_catboost.pyx", line 6032, in _catboost._check_train_params
File "_catboost.pyx", line 6051, in _catboost._check_train_params
**_catboost.CatBoostError: catboost/private/libs/options/catboost_options.cpp:607: if loss-function is Logloss, then class weights should be given for 0 and 1 classes
During handling of the above exception, another exception occurred:
Traceback (most recent call last):**
File "/root/.clearml/venvs-builds/3.9/task_repository/RecSys.git/src/cli/model_training_remote.py", line 313, in <module>
rfs.run(
File "/root/.clearml/venvs-builds/3.9/task_repository/RecSys.git/src/cli/model_training_remote.py", line 232, in run
model.fit(
File "/root/.clearml/venvs-builds/3.9/lib/python3.9/site-packages/clearml/binding/frameworks/__init__.py", line 36, in _inner_patch
raise ex
File "/root/.clearml/venvs-builds/3.9/lib/python3.9/site-packages/clearml/binding/frameworks/__init__.py", line 34, in _inner_patch
ret = patched_fn(original_fn, *args, **kwargs)
File "/root/.clearml/venvs-builds/3.9/lib/python3.9/site-packages/clearml/binding/frameworks/catboost_bind.py", line 110, in _fit
return original_fn(obj, *args, **kwargs)
File "/root/.clearml/venvs-builds/3.9/lib/python3.9/site-packages/catboost/core.py", line 5007, in fit
self._fit(X, y, cat_features, text_features, embedding_features, None, sample_weight, None, None, None, None, baseline, use_best_model,
File "/root/.clearml/venvs-builds/3.9/lib/python3.9/site-packages/catboost/core.py", line 2262, in _fit
train_params = self._prepare_train_params(
File "/root/.clearml/venvs-builds/3.9/lib/python3.9/site-packages/catboost/core.py", line 2194, in _prepare_train_params
_check_train_params(params)
File "_catboost.pyx", line 6032, in _catboost._check_train_params
File "_catboost.pyx", line 6051, in _catboost._check_train_params
**_catboost.CatBoostError: catboost/private/libs/options/catboost_options.cpp:607: if loss-function is Logloss, then class weights should be given for 0 and 1 classes**
I do have the dictionary being connected:
model_params = {
"loss_function": "Logloss",
"eval_metric": "AUC",
"class_weights": {0: 1, 1: 60},
"learning_rate": 0.1
}
registered in the ClearML task as
task.connect(model_params, 'model_params')
and used as parameters for the model in the following call:
model = CatBoostClassifier(**model_params)
When running it from the container in ClearML interactive mode, it also works fine.
Disclaimer: I'm a team members of ClearML
I think I understand the problem, basically I think the issue is:
task.connect(model_params, 'model_params')
Since this is a nested dict:
model_params = {
"loss_function": "Logloss",
"eval_metric": "AUC",
"class_weights": {0: 1, 1: 60},
"learning_rate": 0.1
}
The class_weights is stored as a String
key, but catboost
expects int
key, hence failing.
One option would be to remove the task.connect(model_params, 'model_params')
Another solution (until we fix it) would be to do:
task.connect(model_params, 'model_params')
model_params["class_weights"] = {
0: model_params["class_weights"].get("0", model_params["class_weights"].get(0))
1: model_params["class_weights"].get("1", model_params["class_weights"].get(1))
}