scikit-learnpmml

SkLearn2PMML can't determine the number of output fields


from sklearn.datasets import load_breast_cancer

# Load dataset
data = load_breast_cancer()

# Organize our data
label_names = data['target_names']
labels = data['target']
feature_names = data['feature_names']
features = data['data']

from sklearn.model_selection import train_test_split

# Split our data
train, test, train_labels, test_labels = train_test_split(features, labels, test_size=0.33, random_state=42)

from sklearn.naive_bayes import GaussianNB
from sklearn2pmml import PMMLPipeline

nb_pipeline = PMMLPipeline([
  ('classifier', GaussianNB())
])
#
# Train our classifier
nb_pipeline.fit(train, train_labels)
#
from sklearn2pmml import sklearn2pmml
sklearn2pmml(nb_pipeline, 'nb.pmml', with_repr = True,debug=True)

error traceback:

Exception in thread "main" java.lang.IllegalArgumentException: The estimator object of the final step (Python class sklearn.naive_bayes.GaussianNB) does not specify the number of outputs
    at sklearn2pmml.pipeline.PMMLPipeline.initTargetFields(PMMLPipeline.java:564)
    at sklearn2pmml.pipeline.PMMLPipeline.encodePMML(PMMLPipeline.java:132)
    at com.sklearn2pmml.Main.run(Main.java:91)
    at com.sklearn2pmml.Main.main(Main.java:66)

then i debug find the model:nb_pipeline not contain the key:"target_fields" . so if i want to use GaussianNB ,how can i convert the model to pmml ?Hope to get some ideas, thanks!


Solution

  • This question has been answered in jpmml/sklearn2pmml#357

    In brief, the user appears to be using a legacy Scikit-Learn version, which doesn't set the Estimator.n_outputs_ attribute. SkLearn2PMML package versions 0.86.X are keen to have it.

    This error can be solved either by upgrading Scikit-Learn to 1.1.X (or newer), or SkLearn2PMML to 0.87.0 (or newer).