I am able to run the following example code and get an F1 score:
import h2o
from h2o.estimators.gbm import H2OGradientBoostingEstimator
h2o.init()
# import the airlines dataset:
# This dataset is used to classify whether a flight will be delayed 'YES' or not "NO"
# original data can be found at http://www.transtats.bts.gov/
airlines= h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/airlines/allyears2k_headers.zip")
# convert columns to factors
airlines["Year"]= airlines["Year"].asfactor()
airlines["Month"]= airlines["Month"].asfactor()
airlines["DayOfWeek"] = airlines["DayOfWeek"].asfactor()
airlines["Cancelled"] = airlines["Cancelled"].asfactor()
airlines['FlightNum'] = airlines['FlightNum'].asfactor()
# set the predictor names and the response column name
predictors = ["Origin", "Dest", "Year", "UniqueCarrier",
"DayOfWeek", "Month", "Distance", "FlightNum"]
response = "IsDepDelayed"
# split into train and validation sets
train, valid = airlines.split_frame(ratios = [.8], seed = 1234)
# train your model
airlines_gbm = H2OGradientBoostingEstimator(sample_rate = .7, seed = 1234)
airlines_gbm.train(x = predictors,
y = response,
training_frame = train,
validation_frame = valid)
# retrieve the model performance
perf = airlines_gbm.model_performance(valid)
perf
With output like so:
ModelMetricsBinomial: gbm
** Reported on test data. **
MSE: 0.20546330299964743
RMSE: 0.4532806007316521
LogLoss: 0.5967028742962095
Mean Per-Class Error: 0.31720065289432364
AUC: 0.7414970113257631
AUCPR: 0.7616331690362552
Gini: 0.48299402265152613
Confusion Matrix (Act/Pred) for max f1 @ threshold = 0.35417599264806404:
NO YES Error Rate
0 NO 1641.0 2480.0 0.6018 (2480.0/4121.0)
1 YES 595.0 4011.0 0.1292 (595.0/4606.0)
2 Total 2236.0 6491.0 0.3524 (3075.0/8727.0)
...
However, my dataset doesn't work in a similar manner, despite appearing to be of the same form. My dataset target variable also has a binary label. Some information about my dataset:
y_test.nunique()
failure 2
dtype: int64
Yet my performance (perf
) metrics are a much smaller subset of the example code:
perf = gbm.model_performance(hf_test)
perf
ModelMetricsRegression: gbm
** Reported on test data. **
MSE: 0.02363221438767555
RMSE: 0.1537277281028883
MAE: 0.07460874699751764
RMSLE: 0.12362377397478382
Mean Residual Deviance: 0.02363221438767555
It is difficult to share my data due to its sensitive nature. Any ideas on what to check?
You're training a regression model and that's why you're missing the binary classification metrics. The way that H2O knows whether to train a regression vs classification model is by looking at the data type of the response column.
We explain it here in the H2O User Guide, but this is a frequent question we get since it's different than how scikit-learn works, which uses different methods for regression vs classification and doesn't require you to think about column types.
y_test.nunique()
failure 2
dtype: int64
On the response column in your training data, you can do something like this:
train["response"] = train["response"].asfactor()
Alternatively, when you read the file in from disk, you can parse the response column as "enum" type, so you don't have to convert it, after-the-fact. There's some examples of how to do that in Python here. If the response is stored as integers, H2O just assumes it's a numeric column when it reads in the data from disk, but if the response is stored as strings, it will correctly parse it as a categorical (aka. "enum") column and you won't need to specify or convert it.