rparallel-processingh2ogrid-searchh2o.ai

A question about the parallelism in h2o.grid() function


I try to use the h2o.grid() function from the h2o package to do some tuning using R, when I set the parameter parallelism larger then 1, it always shows the warning

Some models were not built due to a failure, for more details run `summary(grid_object, show_stack_traces = TRUE)

And the model_ids in the final grid object includes many models end with _cv_1, _cv_2 etc, and the number of the models is not equal to the setting of my max_models in search_criteria list, I think they are just the models in the cv process, not the final model.

When I set parallelism larger than 1: when I set "parallelism" larger than 1

When I leave the parallelism default or set it to 1, the result is normal, all models end with _model_1, _model_2 etc.

When I leave the "parallelism" default or set it to 1: when I leave the "parallelism" default or set it to 1

Here is my code:

# set the grid
rf_h2o_grid <- list(mtries = seq(3, ncol(train_h2o), 4),
                    max_depth = c(5, 10, 15, 20))

# set the search_criteria
sc <- list(strategy = "RandomDiscrete", 
           seed = 100,
           max_models = 5
           )

# random grid tuning
rf_h2o_grid_tune_random <- h2o.grid(
  algorithm = "randomForest", 
  x = x, 
  y = y,
  training_frame = train_h2o,
  nfolds = 5,                     # use cv to validate the parameters
  fold_assignment = "Stratified",   
  ntrees = 100,
  seed = 100,
  hyper_params = rf_h2o_grid,
  search_criteria = sc
  # parallelism = 6           # when I set it larger than 1, the result always includes some "cv_" models
  )

So how can I use the parallelism correctly in h2o.grid()? Thanks for helping!


Solution

  • This is an actual issue with parallelism in grid search, previously noticed but not reported correctly. Thanks for raising this, we'll try to fix it soon: see https://h2oai.atlassian.net/browse/PUBDEV-7886 if you want to track progress.

    Until proper fix, you must avoid using both CV and parallelism in your grids.

    Regarding the following error:

    Some models were not built due to a failure, for more details run `summary(grid_object, show_stack_traces = TRUE)

    if the error is reproducible, you should be getting more details by running the grid with verbose=True. Adding the entire error message to the ticket above might also help.