I want to retrain my h2o model on a new set of observations using checkpoint but facing errors. My code is failing on the train step when using checkpoint. My original model is created using h2o automl and I verified aml.leader is the GBM model.
The error is related to max_depth field can't be modified. However, I am not modifying max_depth paramter in the gbm_continued definiton.
#ds_file is my local dataset with 4k rows
ds= h2o.import_file(ds_file)
splits = ds.split_frame(ratios= [0.8], seed=1)
train = splits[0]
test = splits[1]
aml = H2OAutoML(max_runtime_secs = 60, seed = 1 , project_name = 'test')
aml.train(y=y, training_frame = train, leaderboard_frame = test)
#verify that aml.leader is the GBM model
print(aml.leader)
#H2OGradientBoostingEstimator : Gradient Boosting Machine
#Model Key: GBM_1_AutoML_1_20230727_145804
#ds2_file is my local dataset with 30k rows
ds2 = h2o.import_file(ds2_file)
Splits2 = ds2.split_frame(ratios= [0.8], seed=1)
train2 = splits2[0]
test2 = splits2[1]
gbm_continued = H2OGradientBoostingEstimator(model_id = 'gbm_continued', checkpoint = aml.leader)
gbm_continued.train(x=predictors, y = y, training_frame = train2)
Here is the error message:
>>> gbm_continued.train(x=predictors, y = y, training_frame = train2)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "h2o-dev/lib/lib/python3.8/site-packages/h2o/estimators/estimator_base.py", line 108, in train
self._train(parms, verbose=verbose)
File "dev_items/h2o-dev/lib/lib/python3.8/site-packages/h2o/estimators/estimator_base.py", line 187, in _train
model_builder_json = h2o.api("POST /%d/ModelBuilders/%s" % (rest_ver, self.algo), data=parms)
File "h2o-dev/lib/lib/python3.8/site-packages/h2o/h2o.py", line 124, in api
return h2oconn.request(endpoint, data=data, json=json, filename=filename, save_to=save_to)
File "h2o-dev/lib/lib/python3.8/site-packages/h2o/backend/connection.py", line 498, in request
return self._process_response(resp, save_to)
File "h2o-dev/lib/lib/python3.8/site-packages/h2o/backend/connection.py", line 852, in _process_response
raise H2OResponseError(data)
h2o.exceptions.H2OResponseError: ModelBuilderErrorV3 (water.exceptions.H2OModelBuilderIllegalArgumentException):
timestamp = 1690566243266
error_url = '/3/ModelBuilders/gbm'
msg = 'Illegal argument(s) for GBM model: gbm_continued. Details: ERRR on field: _max_depth: Field _max_depth cannot be modified if checkpoint is specified!\nERRR on field: _ntrees: If checkpoint is specified then requested ntrees must be higher than 409'
dev_msg = 'Illegal argument(s) for GBM model: gbm_continued. Details: ERRR on field: _max_depth: Field _max_depth cannot be modified if checkpoint is specified!\nERRR on field: _ntrees: If checkpoint is specified then requested ntrees must be higher than 409'
http_status = 412
I found one related question on this topic but that do not address this question.
To get around the error that you are getting, please try this:
gbm_autoML = h2o.get_model(aml.leader)
gbm_continued = H2OGradientBoostingEstimator(
model_id = 'gbm_continued',
max_depth = gbm_autoML.actual_params['max_depth'],
ntrees = gbm_autoML.actual_params['ntrees']+2,
checkpoint = aml.leader)
To continue to train a GBM model meaning that you are adding more trees into the model. That is why I have added 2 to the ntrees parameter. Feel free to change the 2 to anything else that you want as long as it >= 1.
Hope this helps and good luck.