pythoncatboostclearml

[Catboost][ClearML] Error: if loss-function is Logloss, then class weights should be given for 0 and 1 classes


Having recently started using ClearML to manage the MLOps, I am facing the following problem: When running a script that trains a CatBoost in a binary classification problem using different class weights from my computer, it works perfectly, logs the results and no issues at all. Once I try to run that remotely using the ClearML agent, it results in the following error:

<!-- language: lang-none -->
Traceback (most recent call last):
  File "/root/.clearml/venvs-builds/3.9/lib/python3.9/site-packages/clearml/binding/frameworks/catboost_bind.py", line 102, in _fit
    return original_fn(obj, *args, **kwargs)
  File "/root/.clearml/venvs-builds/3.9/lib/python3.9/site-packages/catboost/core.py", line 5007, in fit
    self._fit(X, y, cat_features, text_features, embedding_features, None, sample_weight, None, None, None, None, baseline, use_best_model,
  File "/root/.clearml/venvs-builds/3.9/lib/python3.9/site-packages/catboost/core.py", line 2262, in _fit
    train_params = self._prepare_train_params(
  File "/root/.clearml/venvs-builds/3.9/lib/python3.9/site-packages/catboost/core.py", line 2194, in _prepare_train_params
    _check_train_params(params)
  File "_catboost.pyx", line 6032, in _catboost._check_train_params
  File "_catboost.pyx", line 6051, in _catboost._check_train_params
**_catboost.CatBoostError: catboost/private/libs/options/catboost_options.cpp:607: if loss-function is Logloss, then class weights should be given for 0 and 1 classes
During handling of the above exception, another exception occurred:
Traceback (most recent call last):**
  File "/root/.clearml/venvs-builds/3.9/task_repository/RecSys.git/src/cli/model_training_remote.py", line 313, in <module>
    rfs.run(
  File "/root/.clearml/venvs-builds/3.9/task_repository/RecSys.git/src/cli/model_training_remote.py", line 232, in run
    model.fit(
  File "/root/.clearml/venvs-builds/3.9/lib/python3.9/site-packages/clearml/binding/frameworks/__init__.py", line 36, in _inner_patch
    raise ex
  File "/root/.clearml/venvs-builds/3.9/lib/python3.9/site-packages/clearml/binding/frameworks/__init__.py", line 34, in _inner_patch
    ret = patched_fn(original_fn, *args, **kwargs)
  File "/root/.clearml/venvs-builds/3.9/lib/python3.9/site-packages/clearml/binding/frameworks/catboost_bind.py", line 110, in _fit
    return original_fn(obj, *args, **kwargs)
  File "/root/.clearml/venvs-builds/3.9/lib/python3.9/site-packages/catboost/core.py", line 5007, in fit
    self._fit(X, y, cat_features, text_features, embedding_features, None, sample_weight, None, None, None, None, baseline, use_best_model,
  File "/root/.clearml/venvs-builds/3.9/lib/python3.9/site-packages/catboost/core.py", line 2262, in _fit
    train_params = self._prepare_train_params(
  File "/root/.clearml/venvs-builds/3.9/lib/python3.9/site-packages/catboost/core.py", line 2194, in _prepare_train_params
    _check_train_params(params)
  File "_catboost.pyx", line 6032, in _catboost._check_train_params
  File "_catboost.pyx", line 6051, in _catboost._check_train_params
**_catboost.CatBoostError: catboost/private/libs/options/catboost_options.cpp:607: if loss-function is Logloss, then class weights should be given for 0 and 1 classes**

I do have the dictionary being connected:

    model_params = {
        "loss_function": "Logloss",
        "eval_metric": "AUC",
        "class_weights": {0: 1, 1: 60},
        "learning_rate": 0.1
    }

registered in the ClearML task as

task.connect(model_params, 'model_params')

and used as parameters for the model in the following call:

model = CatBoostClassifier(**model_params)

When running it from the container in ClearML interactive mode, it also works fine.


Solution

  • Disclaimer: I'm a team members of ClearML

    I think I understand the problem, basically I think the issue is:

    task.connect(model_params, 'model_params')
    

    Since this is a nested dict:

        model_params = {
            "loss_function": "Logloss",
            "eval_metric": "AUC",
            "class_weights": {0: 1, 1: 60},
            "learning_rate": 0.1
        }
    

    The class_weights is stored as a String key, but catboost expects int key, hence failing. One option would be to remove the task.connect(model_params, 'model_params')

    Another solution (until we fix it) would be to do:

    task.connect(model_params, 'model_params')
    model_params["class_weights"] = {
    0: model_params["class_weights"].get("0", model_params["class_weights"].get(0))
    1: model_params["class_weights"].get("1", model_params["class_weights"].get(1))
    }