jsonh2ogrid-searchh2o.ai

Saving H2O GridSeach as CSV


I have the following code:

import h2o
from h2o.estimators.gbm import H2OGradientBoostingEstimator
from h2o.grid.grid_search import H2OGridSearch

h2o.init()
data=h2o.import_file('dataset.csv')
train,test= train.split_frame(ratios=[0.8])

n_trees = [50, 100, 200, 300]
max_depth = [5, 6, 7]
learn_rate = [0.01, 0.05, 0.1]
min_rows = [10,15,20]
min_split_improvement = [0.00001, 0.0001]
hyper_parameters = {"ntrees":n_trees, 
                   "max_depth":max_depth,
                   "learn_rate":learn_rate,
                   "min_rows":min_rows}

gs=H2OGridSearch(model=H2OGradientBoostingEstimator, hyper_params=hyper_parameters)
gs.train(x=train.columns, y=target_column, training_frame=train, validation_frame=test, distribution='bernoulli')

grid_perf=gs.get_grid(sort_by='auc',decreasing=True)

This produces a grid search of GBMs on the dataset. I want to be able to save the result of the grid search, grid_perf, as a csv.

Something along the lines of: h2o.export_file(grid_perf,'grid_search_results.csv')

Note: the code above works, so no debugging necessary, thanks.

Tried using the above line, but it gives me a Argument python_obj should be a None | list | tuple | dict | numpy.ndarray | pandas.DataFrame | scipy.sparse.issparse, got H2OGridSearch error.


Solution

  • Thanks to Adam Valenta for the suggestion. Using that, the solution is:

    grid_perf=gs.get_grid(sort_by='auc', decreasing=True)
    table = grid_perf._grid_json['summary_table'].as_data_frame()
    table.to_csv('GridSearch1.csv',index=False)