driverless-ai

Multilabel classification using H20.ai


We are testing out the capabilities of driverless AI. One of our first datasets is like this. X1,X2.... X400, Y1,Y2...Y200
Here we want to do multi-label classification on our dataset. However, in the driverless AI web client, there is only an option to specify only one target.

Another alternative , I tried was concating all the Y variables into a single list. enter image description here
However, instead of predicting each Y variable, h20.ai just treats every sequence of number as a class.
Like if there was 3 Y variables.
then [0 0 1] and [0 1 0] and so on till 8 classes.
Then while training, it just complains that some of these 8 classes dont have enough rows and drops them. In my case, i have over 200 Y variables, so it drops a lot of these classes.

How to do this in driverless AI?


Solution

  • Driverless AI does not support multi-label at the moment. One option would be to create a model for each class (which is what multi-class modeling does anyway). 200 Y variables/classes is a lot, so you may want to use the Python client to automate it, but that would take some time to run them all and evaluate. Maybe try it out for the top 5 classes and see how they perform. It may be helpful to consider reducing the 200 classes into groups to simplify it.