pythonscikit-learncross-validationk-fold

K-Folds cross-validator show KeyError: None of Int64Index


I try to use K-Folds cross-validator with dicision tree. I use for loop to train and test data from KFOLD like this code.

df = pd.read_csv(r'C:\\Users\data.csv')
    
# split data into X and y
X = df.iloc[:,:200]
Y = df.iloc[:,200]

X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.2)

clf = DecisionTreeClassifier()

kf =KFold(n_splits=5, shuffle=True, random_state=3)

cnt = 1

# Cross-Validate
for train, test in kf.split(X, Y):
    print(f'Fold:{cnt}, Train set: {len(train)}, Test set:{len(test)}')
    cnt += 1
    
    X_train = X[train]
    y_train = Y[train]
    X_test = X[test]
    y_test = Y[test]

    clf = clf.fit(X_train,y_train)

    predictions = clf.predict(X_test)
    accuracy = accuracy_score(y_test, predictions)

    print("test")
    print(y_test)
    print("predict")
    print(predictions)
    print("Accuracy: %.2f%%" % (accuracy * 100.0))

when I run it show error like this.

KeyError: "None of [Int64Index([  0,   1,   2,   5,   7,   8,   9,  10,  11,  12,\n            ...\n            161, 164, 165, 166, 167, 168, 169, 170, 171, 173],\n           dtype='int64', length=120)]

How to fix it?


Solution

  • The issue is here:

    X_train = X[train]
    y_train = Y[train]
    X_test = X[test]
    y_test = Y[test]
    

    To access some parts/slices of your dataframe, you should use the iloc property. This should solve your problem:

    X_train = X.iloc[train]
    y_train = Y.iloc[train]
    X_test = X.iloc[test]
    y_test = Y.iloc[test]