Is there any way to extract the auto generated machine learning pipeline in a standalone python script from auto-sklearn?
Here is a sample code using auto-sklearn:
import autosklearn.classification
import sklearn.cross_validation
import sklearn.datasets
import sklearn.metrics
digits = sklearn.datasets.load_digits()
X = digits.data
y = digits.target
X_train, X_test, y_train, y_test = sklearn.cross_validation.train_test_split(X, y, random_state=1)
automl = autosklearn.classification.AutoSklearnClassifier()
automl.fit(X_train, y_train)
y_hat = automl.predict(X_test)
print("Accuracy score", sklearn.metrics.accuracy_score(y_test, y_hat))
It would be nice to have automatic equivalent python code generated somehow.
By comparison, when using TPOT we can obtain the standalone pipeline as follows:
from tpot import TPOTClassifier
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
digits = load_digits()
X_train, X_test, y_train, y_test = train_test_split(digits.data, digits.target, train_size=0.75, test_size=0.25)
tpot = TPOTClassifier(generations=5, population_size=20, verbosity=2)
tpot.fit(X_train, y_train)
print(tpot.score(X_test, y_test))
tpot.export('tpot-mnist-pipeline.py')
And when inspecting tpot-mnist-pipeline.py
the entire ML pipeline can be seen:
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.pipeline import make_pipeline
# NOTE: Make sure that the class is labeled 'class' in the data file
tpot_data = np.recfromcsv('PATH/TO/DATA/FILE', delimiter='COLUMN_SEPARATOR')
features = tpot_data.view((np.float64, len(tpot_data.dtype.names)))
features = np.delete(features, tpot_data.dtype.names.index('class'), axis=1)
training_features, testing_features, training_classes, testing_classes = train_test_split(features, tpot_data['class'], random_state=42)
exported_pipeline = make_pipeline(
KNeighborsClassifier(n_neighbors=3, weights="uniform")
)
exported_pipeline.fit(training_features, training_classes)
results = exported_pipeline.predict(testing_features)
The examples above are in relation to an exiting post on automating somewhat shallow machine learning found here.
There is no automated way. You can store the object in pickle format and load later.
with open('automl.pkl', 'wb') as output:
pickle.dump(automl,output)
You can debug the fit or predict methods and see what is going on.