I am trying to understand how the confusion matrix in h2o.explain is generated. If I use the following code: h2o.explain(model@leader, test_set, include_explanations="confusion_matrix"), is the generated confusion matrix evaluating the model accuracy on the test set? How would this be different from using h2o.predict on the test set (e.g. h2o.predict(model@leader, test_set)?
Yes, the h2o.explain
uses the provided test_set
. The confusion matrix itself in your case is generated by h2o.confusionMatrix(object = model@leader, newdata = test_set)
.
Confusion matrix aggregates the data from h2o.predict
thus providing some high level view on how does the model perform. h2o.predict
gives you individual predictions without any aggregation.