scikit-learndecision-tree

How to apply the exported sklearn trained tree to the test data


from sklearn.tree import DecisionTreeRegressor, export_text

cols_X = ['f1', 'f2']
df_train = pd.DataFrame([[1, 3, 4], [2, 5, 1], [7, 8, 7]], columns=['f1', 'f2', 'label'])
df_test = pd.DataFrame([[2, 5], [3, 1]], columns=cols_X)

tree = DecisionTreeRegressor()
tree.fit(df_train[['f1', 'f2']], df_train['label'])

file = open(path + "myTree.txt", "w") 
file.write(export_text(tree, feature_names=cols_X)) 
file.write("\n") 
file.close()

input_tree = pd.read_csv(path + "myTree.txt") #not sure if should read as csv

A sklearn regression tree has been trained and exported as a txt file. Then how do I import it and apply onto the test data to make a prediction as .predict()? Since I am totally unfamiliar with the data structure in txt file, even not sure if i should read it as 'txt'.


Solution

  • You are missing something important here.

    The exported tree text when the function export_text was used to create your .txt is for interpretability purposes, not for reloading the model for prediction.

    It prints the decision tree rules. This is neither the model nor its parameters. It is a visual way of what the model is doing internally. You cannot load or make a model using this txt file.

    Re-build your model and save it as pickle.

    Example:

    # training the model on training set 
    model_clf = KNeighborsClassifier(n_neighbors=3) 
    model_clf.fit(X_train, y_train) 
      
    # Saving classifier using pickle 
    pickle.dump(model_clf, open("model_clf_pickle", 'wb')) 
      
    # load classifier using pickle 
    my_model_clf = pickle.load(open("model_clf_pickle", 'rb')) 
    result_score = my_model_clf.score(X_test,y_test)