machine-learningh2o

H2O Variable standardization


The documentation in standardize section https://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-science/algo-params/standardize.html only includes these algorithms: Deep Learning, GLM, GAM, K-Means.

I have two questions:

  1. Does it mean that other algorithms such as Random Forest, Gradient Boosting, etc, are not standardizing (at least automatically in AutoML)?

  2. Does standardize = TRUE in Deep Learning, GLM, ..., standardize the target variable altogether, or only features?

A related question is Feature Standardize in AutoML H2O.


Solution

  • Regarding your question 1. Correct. For algorithms that do not have the standardize parameter, the predictors are not standardized. For tree based algorithms, we are dealing with comparisons like val >= threshold to determine which side of the child nodes to go to. If we implement standardization, we will have to perform (val-mean)/standard deviation >= threshold. In choosing not to standardize will say us a lot of time during the tree traversal because we don't need to perform standardization of the predictors when we are trying to evaluate the expression val >= threshold.

    Regarding question 2: When you set standardize=true, only the numerical features are standardized. The response column is not standardized.