According to the SynapseML documentation. It states that we can export a lgbm model to pmml. The link to the package to install is located here. However I am unable to install that package using maven path specified. It just shows a red X in Databricks. So next I tried to install
but I am getting an error.
Transformer class com.microsoft.azure.synapse.ml.lightgbm.LightGBMClassificationModel is not supported
Is there a better way to go about this? Should I use a different LGBM other than the SynapseML version?
Thanks
stages = []
for categoricalCol in categoricalColumns:
indexers = StringIndexer(inputCol = categoricalCol, outputCol = categoricalCol+ '_Index').setHandleInvalid("keep")
stages += [indexers]
assemblerInputs = [c + "_Index" for c in categoricalColumns] + numericColsFeatures
assembler = VectorAssembler(inputCols=assemblerInputs, outputCol="features")
stages += [assembler]
lgbm = LightGBMClassifier(objective="binary", featuresCol="features", labelCol="label",learningRate=0.3,numIterations=100,numLeaves=31)
stages += [lgbm]
pipeline = Pipeline(stages = stages)
print('Running model')
pipelineModel = pipeline.fit(df)
pmmlBuilder = PMMLBuilder(spark.sparkContext, df, pipelineModel)
pmmlBuilder.buildFile("/dbfs/tmp/pmmlModel" + ts.strftime(dateFormat) + "_test.pmml")
The JPMML-SparkML library includes a dedicated org.jpmml:pmml-sparkml-lightgbm
module for quite some time now. Simply add it to your Apache Spark packagepath using the --packages
options:
$ $SPARK_HOME/bin/spark-submit --packages "com.microsoft.azure:synapseml-lightgbm_2.12:0.10.2,org.jpmml:pmml-sparkml-lightgbm:2.4.0" myscript.py
This module does not need any special configuration when being accessed from within PySpark (as opposed to "plain" Apache Spark).
The JPMML-SparkML library is being distributed via Maven Central repository only. It's not being pushed to proprietary repos such as Databricks, which may explain the "red X" that you're seeing.