pythonscikit-learnh2oh2o4gpu

Does h2o4gpu handle categorical features like sklearn or like h2o?


I understand that sklearn requires categorical features to be encoded to dummy variables or one-hot encoded when running the sklearn.ensemble.RandomForestRegressor method, and that XGBoost requires the same, but h2o permitted raw categorical features to be used in its h2o.estimators.random_forest.H2ORandomForestEstimator method. Since h2o4gpu's implementation of random forest is built on top of XGBoost, does this mean support for raw categorical features is not included?


Solution

  • There is no native support for categorical columns in h2o4gpu (at least yet), so you will have to one-hot encode (or label encode) your categorical columns like you do in sklearn and xgboost.