I export an instance of sklearn.preprocessing.StandardScaler
into a pmml-file. The problem is, that the names of the fields do not appear in the pmml-file, e.g. when using the iris dataset then the original field names ['sepal length (cm)','sepal width (cm)','petal length (cm)','petal width (cm)']
do not appear. Instead only names like x1,x2, etc appear. Is there a way to get the original field names in the pmml-file?
The Following code should be runnable:
from sklearn2pmml import sklearn2pmml, PMMLPipeline, make_pmml_pipeline
from sklearn.datasets import load_iris
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
import pandas as pd
data = load_iris()
dfIris = pd.DataFrame(data=data.data, columns=data.feature_names)
ssModel = StandardScaler()
ssModel.fit(dfIris)
pipe = PMMLPipeline([("StandardScaler", ssModel)])
sklearn2pmml(pipeline=make_pmml_pipeline(pipe), pmml="ssIris.pmml")
First, I believe you want to fit the PMMLPipeline after initialization so you may use pipe.fit(dfIris)
instead of fitting before the ssModel. To preserve the column names add a none preprocessing function that uses DataFrameMapper to map pandas data frame columns to different sklearn transformations before the scaler, as the pipeline expects a preprocessing function in order to keep the column names. I am not sure whether this is the best way but I checked it and it was preserving the column names.
from sklearn_pandas import DataFrameMapper
from sklearn2pmml import sklearn2pmml, PMMLPipeline, make_pmml_pipeline
from sklearn.datasets import load_iris
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
import pandas as pd
data = load_iris()
dfIris = pd.DataFrame(data=data.data, columns=data.feature_names)
ssModel = StandardScaler()
pipe.fit(dfIris)
pipe = PMMLPipeline([("df_mapper",
DataFrameMapper([(d, None) for d in data.feature_names],
df_out=True)), ("StandardScaler", ssModel)])
pipe.fit(dfIris)
sklearn2pmml(pipeline=make_pmml_pipeline(pipe), pmml="ssIris.pmml")