pythonmachine-learningscikit-learnsklearn-pandaspyxll

KeyError:"['class']" not found in axis


I found a tutorial about decision tree algorithm using pyxll add-in for excel, and tried to execute. I get an error: KeyError:"['class']" not found in axis.

from pyxll import xl_func
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
import pandas as pd
import os

@xl_func("float, int, int: object")
def ml_get_zoo_tree_2(train_size=0.75, max_depth=5, random_state=245245):
    # Load the zoo data
    dataset = pd.read_csv(os.path.join(os.path.dirname(__file__), "zoo.csv"))

    # Drop the animal names since this is not a good feature to split the data on
    dataset = dataset.drop("animal_name", axis=1)

    # Split the data into a training and a testing set
    features = dataset.drop("class", axis=1)
    targets = dataset["class"]

    train_features, test_features, train_targets, test_targets = \
        train_test_split(features, targets, train_size=train_size, random_state=random_state)

    # Train the model
    tree = DecisionTreeClassifier(criterion="entropy", max_depth=max_depth)
    tree = tree.fit(train_features, train_targets)

    # Add the feature names to the tree for use in predict function
    tree._feature_names = features.columns

    return tree

If i removed line 17 and 18 for class code, then i get error NameError: name 'features' is not defined, then when i removed feature i get error as target has to be defined.


Solution

  • You need the correct dataset to go with that tutorial. You can download it (and the code) from here https://github.com/pyxll/pyxll-examples/tree/master/machine-learning.