pysparkdatabrickspmml

SynapseML LightGBM to PMML


According to the SynapseML documentation. It states that we can export a lgbm model to pmml. The link to the package to install is located here. However I am unable to install that package using maven path specified. It just shows a red X in Databricks. So next I tried to install

pmml-sparkml-lightgbm

but I am getting an error.

Transformer class com.microsoft.azure.synapse.ml.lightgbm.LightGBMClassificationModel is not supported

Is there a better way to go about this? Should I use a different LGBM other than the SynapseML version?

Thanks

stages = []
for categoricalCol in categoricalColumns:
    indexers = StringIndexer(inputCol = categoricalCol, outputCol = categoricalCol+ '_Index').setHandleInvalid("keep")
    stages += [indexers]
assemblerInputs = [c + "_Index" for c in categoricalColumns] + numericColsFeatures
assembler = VectorAssembler(inputCols=assemblerInputs, outputCol="features")
stages += [assembler]    
lgbm = LightGBMClassifier(objective="binary", featuresCol="features", labelCol="label",learningRate=0.3,numIterations=100,numLeaves=31)
stages += [lgbm]
pipeline = Pipeline(stages = stages)
print('Running model')
pipelineModel = pipeline.fit(df)  

pmmlBuilder = PMMLBuilder(spark.sparkContext, df, pipelineModel)
pmmlBuilder.buildFile("/dbfs/tmp/pmmlModel" + ts.strftime(dateFormat) + "_test.pmml")

Solution

  • The JPMML-SparkML library includes a dedicated org.jpmml:pmml-sparkml-lightgbm module for quite some time now. Simply add it to your Apache Spark packagepath using the --packages options:

    $ $SPARK_HOME/bin/spark-submit --packages "com.microsoft.azure:synapseml-lightgbm_2.12:0.10.2,org.jpmml:pmml-sparkml-lightgbm:2.4.0" myscript.py
    

    This module does not need any special configuration when being accessed from within PySpark (as opposed to "plain" Apache Spark).

    The JPMML-SparkML library is being distributed via Maven Central repository only. It's not being pushed to proprietary repos such as Databricks, which may explain the "red X" that you're seeing.