rpmml

r2pmml conversion of 300 MB randomForest fails


There is a randomForest model in R which I'd like to convert to pmml.

    load("rf.RData")
    r2pmml(rf, "file.pmml", compact=T)

gives the following result:

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space at org.jpmml.rexp.RExpParser.readIntVector(RExpParser.java:269) at org.jpmml.rexp.RExpParser.readRExp(RExpParser.java:88) at org.jpmml.rexp.RExpParser.readVector(RExpParser.java:329) at org.jpmml.rexp.RExpParser.readRExp(RExpParser.java:97) at org.jpmml.rexp.RExpParser.readVector(RExpParser.java:329) at org.jpmml.rexp.RExpParser.readRExp(RExpParser.java:97) at org.jpmml.rexp.RExpParser.parse(RExpParser.java:53) at com.r2pmml.Main.run(Main.java:83) at com.r2pmml.Main.main(Main.java:71) Fehler in .convert(tempfile, file, converter, converter_classpath, verbose) : The R2PMML conversion application has failed (error code 1). The Java executable should have printed more information about the failure into its standard output and/or standard error streams

Looks like it's a memory problem. My laptop has 8 GB RAM, the randomForest model is ~300 MB, R is version 4.3.2, and r2pmml version 0.27.1

I added options(java.parameters = c("-Xms2G", "-Xmx8G")) at the start of the R code to increase the available memory, and changed the code to

    load("rf.RData")
    decorate(rf, compact = F)
    r2pmml(rf, "file.pmml", compact=T)

but it didn't change the outcome.

What now? Is there a way to convert the model on my laptop? If not, is there a simple (!) way to do it in the cloud?


Solution

  • Dump the model in R's built-in RDS data format into a file in local filesystem. Then, use the JPMML-R command-line application to perform the RDS-to-PMML conversion.

    You can adjust JPMML-R's memory usage using standard Java/JVM command-line options:

    $ java -Xms2G -Xmx8G pmml-rexp-example-executable-${version}.jar --rds-input RF.rds --pmml-output RF.pmml
    

    Also, when dealing with large RF models, it is advisable to activate model compaction (ie. compact = TRUE). However, the compaction pass runs after the standard conversion pass, so the in-memory model object still retains its original memory requirements (but the eventual PMML document is ~half the size).