I am building document classification system using scikit-learn and it works fine. I am converting the model to Core ML model format. But the model format excepts the input parameter as multiArrayType. I want make it to excepts string or array of string so that I can easily predict from IOS application.I have tried following way:
from sklearn.linear_model import LogisticRegression
logreg = LogisticRegression()
logreg.fit(X_train_dtm, y_train)
#testing a value
docs_new = ['get exclusive prize offer']
docs_pred_class = nb.predict(count_vect.transform(docs_new))
#Exporting to coremodel
import coremltools
coreml_model = coremltools.converters.sklearn.convert(logreg)
#print model
coreml_model
Printing the coreml model gives following output:
input {
name: "input"
type {
multiArrayType {
shape: 7505
dataType: DOUBLE
}
}
}
output {
name: "classLabel"
type {
int64Type {
}
}
}
output {
name: "classProbability"
type {
dictionaryType {
int64KeyType {
}
}
}
}
predictedFeatureName: "classLabel"
predictedProbabilitiesName: "classProbability"
I checked the Core ML model in GitHub library, I can see there is different input and output.
How can I achieve this, so that I can pass a simple parameter from IOS app to make prediction.
It sounds like that other mlmodel you found uses a DictVectorizer
to turn the strings into indexes (possibly followed by a OneHotEncoder
).
You can do this by making a pipeline in sklearn and converting that pipeline to Core ML.