pythonmojodriverless-ai

How does DAI handle new (unseen in training) categorical values within a production environment?


I would like confirmation that DAI follows a similar structure for dealing with categorical variables it didn't encounter within training, as in this answer h2o DRF unseen categorical values handling. I could not find it explicitly within the H2O Driverless AI documentation.

Please also state if parts of that link are outdated (as mentioned in the answer) and how it's being processed if this is happening differently. Please note the version of h2o DAI. Thank you!


Solution

  • EDIT this information is now detailed in the documentation here

    Below is a description of what happens when you try to predict on a categorical level not seen during training. Depending on the version of DAI you use, you may not have access to a certain algorithm, but given an algorithm, the details should apply to your version of DAI.

    and

    Since DAI uses different algorithms than H2O-3 (except for XGBoost), it's best to consider these as separate products with potentially different handling of unseen levels or missing values - though in some cases there are similarities.

    As mentioned in the comment, the DRF documentation for H2O-3 should be up to date now.

    Hope this explanation helps!