I want to convert a logistic model built by the mlr-package directly into a XML-file using the package pmml. The problem is that the model.learner built by the mlr wrapper doesn't include the model link in the list, like it is in the normal stats::glm function. So here is an example:
library(dplyr)
library(titanic)
library(pmml)
library(ParamHelpers)
library(mlr)
Titanic_data = select(titanic_train, Survived, Pclass, Sex, Age)
Titanic_data$Survived = as.factor(Titanic_data$Survived)
Titanic_data$Sex = as.factor(Titanic_data$Sex)
Titanic_data$Pclass = as.factor(Titanic_data$Pclass)
Titanic_data = na.omit(Titanic_data)
lrn <- makeLearner("classif.logreg", predict.type = "prob")
task = makeClassifTask(data = Titanic_data, target = "Survived", positive = "1")
model = train(lrn, task)
model_glm = glm(Survived ~ ., data = Titanic_data, family = "binomial")
str(model$learner.model) # list of 29
str(model_glm) # list of 30
As you can see, the structure of both models is a list of different elements and they are all the same, beside the fact that the model is missing in the wrapper. Therefore I get an error message using pmml:
pmml(model_glm)
# Error in pmml.glm(model$learner.model) : object 'model.link' not found
The one built by stats::glm is working:
pmml(model)
<PMML version="4.4" xmlns="http://www.dmg.org/PMML-4_4" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.dmg.org/PMML-4_4 http://www.dmg.org/pmml/v4-4/pmml-4-4.xsd">
<Header copyright="Copyright (c) 2020 TBeige" description="Generalized Linear Regression Model">
<Extension name="user" value="TBeige" extender="SoftwareAG PMML Generator"/>
<Application name="SoftwareAG PMML Generator" version="2.3.1"/>
<Timestamp>2020-05-12 09:50:15</Timestamp>
</Header>
<DataDictionary numberOfFields="4">
<DataField name="Survived" optype="categorical" dataType="string">
<Value value="0"/>
<Value value="1"/>
</DataField>
<DataField name="Pclass" optype="categorical" dataType="string">
<Value value="1"/>
<Value value="2"/>
<Value value="3"/>
</DataField>
<DataField name="Sex" optype="categorical" dataType="string">
<Value value="female"/>
<Value value="male"/>
</DataField>
<DataField name="Age" optype="continuous" dataType="double"/>
</DataDictionary>
<GeneralRegressionModel modelName="General_Regression_Model" modelType="generalizedLinear" functionName="classification" algorithmName="glm" distribution="binomial" linkFunction="logit">
<MiningSchema>
<MiningField name="Survived" usageType="predicted" invalidValueTreatment="returnInvalid"/>
<MiningField name="Pclass" usageType="active" invalidValueTreatment="returnInvalid"/>
<MiningField name="Sex" usageType="active" invalidValueTreatment="returnInvalid"/>
<MiningField name="Age" usageType="active" invalidValueTreatment="returnInvalid"/>
</MiningSchema>
<Output>
<OutputField name="Probability_1" targetField="Survived" feature="probability" value="1" optype="continuous" dataType="double"/>
<OutputField name="Predicted_Survived" feature="predictedValue" optype="categorical" dataType="string"/>
</Output>
<ParameterList>
<Parameter name="p0" label="(Intercept)"/>
<Parameter name="p1" label="Pclass2"/>
<Parameter name="p2" label="Pclass3"/>
<Parameter name="p3" label="Sexmale"/>
<Parameter name="p4" label="Age"/>
</ParameterList>
<FactorList>
<Predictor name="Pclass"/>
<Predictor name="Sex"/>
</FactorList>
<CovariateList>
<Predictor name="Age"/>
</CovariateList>
<PPMatrix>
<PPCell value="2" predictorName="Pclass" parameterName="p1"/>
<PPCell value="3" predictorName="Pclass" parameterName="p2"/>
<PPCell value="male" predictorName="Sex" parameterName="p3"/>
<PPCell value="1" predictorName="Age" parameterName="p4"/>
</PPMatrix>
<ParamMatrix>
<PCell targetCategory="1" parameterName="p0" df="1" beta="3.77701265255885"/>
<PCell targetCategory="1" parameterName="p1" df="1" beta="-1.30979926778885"/>
<PCell targetCategory="1" parameterName="p2" df="1" beta="-2.58062531749203"/>
<PCell targetCategory="1" parameterName="p3" df="1" beta="-2.52278091988034"/>
<PCell targetCategory="1" parameterName="p4" df="1" beta="-0.0369852655754339"/>
</ParamMatrix>
</GeneralRegressionModel>
</PMML>
Any idea how I can use mlr and creating a xml find using pmml?
The problem seems to be inside pmml
From pmml::pmml.glm
:
if (model$call[[1]] == "glm") {
model.type <- model$family$family
model.link <- model$family$link
}
else {
model.type <- "unknown"
}
In the mlr model we have
model$learner.model$call[[1]]
# stats::glm
So you can just hack
model$learner.model$call[[1]] = "glm"
and then
pmml(model$learner.model)
works.
To be honest it seems to be weird code in the pmml
package.