[SOLVED] Auto-Machine-Learning python equivalent code

Auto-Machine-Learning python equivalent code

Is there any way to extract the auto generated machine learning pipeline in a standalone python script from auto-sklearn?

Here is a sample code using auto-sklearn:

import autosklearn.classification
import sklearn.cross_validation
import sklearn.datasets
import sklearn.metrics

digits = sklearn.datasets.load_digits()
X = digits.data
y = digits.target
X_train, X_test, y_train, y_test = sklearn.cross_validation.train_test_split(X, y, random_state=1)

automl = autosklearn.classification.AutoSklearnClassifier()
automl.fit(X_train, y_train)
y_hat = automl.predict(X_test)

print("Accuracy score", sklearn.metrics.accuracy_score(y_test, y_hat))

It would be nice to have automatic equivalent python code generated somehow.

By comparison, when using TPOT we can obtain the standalone pipeline as follows:

from tpot import TPOTClassifier
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split

digits = load_digits()
X_train, X_test, y_train, y_test = train_test_split(digits.data, digits.target, train_size=0.75, test_size=0.25)

tpot = TPOTClassifier(generations=5, population_size=20, verbosity=2)
tpot.fit(X_train, y_train)

print(tpot.score(X_test, y_test))

tpot.export('tpot-mnist-pipeline.py')

And when inspecting tpot-mnist-pipeline.py the entire ML pipeline can be seen:

import numpy as np

from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.pipeline import make_pipeline

# NOTE: Make sure that the class is labeled 'class' in the data file
tpot_data = np.recfromcsv('PATH/TO/DATA/FILE', delimiter='COLUMN_SEPARATOR')
features = tpot_data.view((np.float64, len(tpot_data.dtype.names)))
features = np.delete(features, tpot_data.dtype.names.index('class'), axis=1)
training_features, testing_features, training_classes, testing_classes =     train_test_split(features, tpot_data['class'], random_state=42)

exported_pipeline = make_pipeline(
    KNeighborsClassifier(n_neighbors=3, weights="uniform")
)

exported_pipeline.fit(training_features, training_classes)
results = exported_pipeline.predict(testing_features)

The examples above are in relation to an exiting post on automating somewhat shallow machine learning found here.

Solution

There is no automated way. You can store the object in pickle format and load later.

with open('automl.pkl', 'wb') as output:
    pickle.dump(automl,output)

You can debug the fit or predict methods and see what is going on.