from sklearn.datasets import load_breast_cancer
# Load dataset
data = load_breast_cancer()
# Organize our data
label_names = data['target_names']
labels = data['target']
feature_names = data['feature_names']
features = data['data']
from sklearn.model_selection import train_test_split
# Split our data
train, test, train_labels, test_labels = train_test_split(features, labels, test_size=0.33, random_state=42)
from sklearn.naive_bayes import GaussianNB
from sklearn2pmml import PMMLPipeline
nb_pipeline = PMMLPipeline([
('classifier', GaussianNB())
])
#
# Train our classifier
nb_pipeline.fit(train, train_labels)
#
from sklearn2pmml import sklearn2pmml
sklearn2pmml(nb_pipeline, 'nb.pmml', with_repr = True,debug=True)
error traceback:
Exception in thread "main" java.lang.IllegalArgumentException: The estimator object of the final step (Python class sklearn.naive_bayes.GaussianNB) does not specify the number of outputs
at sklearn2pmml.pipeline.PMMLPipeline.initTargetFields(PMMLPipeline.java:564)
at sklearn2pmml.pipeline.PMMLPipeline.encodePMML(PMMLPipeline.java:132)
at com.sklearn2pmml.Main.run(Main.java:91)
at com.sklearn2pmml.Main.main(Main.java:66)
then i debug find the model:nb_pipeline not contain the key:"target_fields" . so if i want to use GaussianNB ,how can i convert the model to pmml ?Hope to get some ideas, thanks!
This question has been answered in jpmml/sklearn2pmml#357
In brief, the user appears to be using a legacy Scikit-Learn version, which doesn't set the Estimator.n_outputs_
attribute. SkLearn2PMML package versions 0.86.X are keen to have it.
This error can be solved either by upgrading Scikit-Learn to 1.1.X (or newer), or SkLearn2PMML to 0.87.0 (or newer).