I am trying to build stacked ensemble models using H2O Java APIs.
For this, I trained 2 models
I exported these models in both Mojo and Binary format. For exporting models, I used the following code snippet:
For Mojo Format Export:
water.api.ModelsHandler modelsHandler = new ModelsHandler();
water.api.StreamingSchema streamingSchema = modelsHandler.fetchMojo(3, modelsV3); //water.api.schemas3.ModelsV3
For Binary format Export:
water.api.ModelsHandler modelsHandler = new ModelsHandler();
modelsHandler.exportModel(3, modelExport); //water.api.schemas3.ModelExportV3
I also exported the cross validation holdout data as it is required later to train stacked ensemble models.
Lets assume my exported model names and their holdout data names are as below:
ModelName: StackGBMReg1 CVDataName: cv_holdout_prediction_StackGBMReg1
ModelName: StackDRFReg1 CVDataName: cv_holdout_prediction_StackDRFReg1
Training Stacked Ensemble Models
Then I import these models and their CV data later to H2O Server to train stacked Ensemble models. Here is the code snippet for this operation:
To import holdout data:
ImportFilesV3 importFile = h2o.importFiles(workingDir + fileName); //fileName: cv_holdout_prediction_StackGBMReg1 or DRFReg1 one.
To import model:
ModelsHandler modelsHandler = new ModelsHandler();
water.api.schemas3.ModelsV3 importedModel = modelsHandler.importModel(3, modelImport); //water.api.schemas3.ModelImportV3
I get following error when I try to import Mojo Models.
H2OException: Error while importing model : StackGBMReg1.zip
at ImportAndScore.importModel(ImportAndScore.java:306)
at ImportAndScore.main(ImportAndScore.java:61)
Caused by: java.lang.IllegalArgumentException: Missing magic number 0x1CED at stream start
at water.AutoBuffer.<init>(AutoBuffer.java:287)
at hex.Model.importBinaryModel(Model.java:2380)
at water.api.ModelsHandler.importModel(ModelsHandler.java:209)
at ImportAndScore.importModel(ImportAndScore.java:302)
... 1 more
As per the reply I got on h2o forum, import of Mojo Models is not supported. I find this really strange.
To overcome this problem, I imported Binary Models which are successful. Then trained stacked ensemble models which worked fine for me.
My questions are:
1. Since, mojo Model import is not working using ModelsHandler.importModel(), is there another API available or work around which can help me to import Mojo Model in H2O?
2. Can we convert, POJO or MOJO models into binary Model for import purpose?
3. As per last reply for h2o, binary models are not backward compatible. So, if I upgrade H2O later, my older trained models will not work for training new stacked ensemble models. Actually, it will fail at import step itself.
a. So, is there a way to use the binary models without having the backward compatibility issue?
b. If binary models are the only way to go, then is my approach right for training stacked Ensemble models(using previously exported/saved models)?
c. Am I likely to face any other issue with binary models in future which I dont for see now?
My main concern is get rid of backward compatibility issue. If there is any way to get around with it, that will really help my work. Since, I am using Java code, I wouldn't mind using any internal h2o api, which is not directly exposed.
Please note that I am not talking about loading MOJO models for scoring purpose. I understand that we can easily use Mojo Models for scoring as per this link: http://docs.h2o.ai/h2o/latest-stable/h2o-docs/productionizing.html
Responding to your questions inline:
1. Since, mojo Model import is not working using ModelsHandler.importModel(), is there another API available or work around which can help me to import Mojo Model in H2O?
No. MOJO was designed with the purpose of facilitating putting a model into production.
2. Can we convert, POJO or MOJO models into binary Model for import purpose? No.
3. As per last reply for h2o, binary models are not backward compatible. So, if I upgrade H2O later, my older trained models will not work for training new stacked ensemble models. Actually, it will fail at import step itself.
Yes, you will only be able to load and use that saved binary model with the same version of H2O that you used to train your model. H2O binary models are not compatible across H2O versions.
a. So, is there a way to use the binary models without having the backward compatibility issue? No.
b. If binary models are the only way to go, then is my approach right for training stacked Ensemble models(using previously exported/saved models)? Yes.
c. Am I likely to face any other issue with binary models in future which I dont for see now? Backwards compatibility is the main issue.
Please note that H2O.ai might support reading MOJOs into H2O-3 in the future, but currently there is no timetable for it.