I try to use K-Folds cross-validator with dicision tree. I use for loop to train and test data from KFOLD like this code.
df = pd.read_csv(r'C:\\Users\data.csv')
# split data into X and y
X = df.iloc[:,:200]
Y = df.iloc[:,200]
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.2)
clf = DecisionTreeClassifier()
kf =KFold(n_splits=5, shuffle=True, random_state=3)
cnt = 1
# Cross-Validate
for train, test in kf.split(X, Y):
print(f'Fold:{cnt}, Train set: {len(train)}, Test set:{len(test)}')
cnt += 1
X_train = X[train]
y_train = Y[train]
X_test = X[test]
y_test = Y[test]
clf = clf.fit(X_train,y_train)
predictions = clf.predict(X_test)
accuracy = accuracy_score(y_test, predictions)
print("test")
print(y_test)
print("predict")
print(predictions)
print("Accuracy: %.2f%%" % (accuracy * 100.0))
when I run it show error like this.
KeyError: "None of [Int64Index([ 0, 1, 2, 5, 7, 8, 9, 10, 11, 12,\n ...\n 161, 164, 165, 166, 167, 168, 169, 170, 171, 173],\n dtype='int64', length=120)]
How to fix it?
The issue is here:
X_train = X[train]
y_train = Y[train]
X_test = X[test]
y_test = Y[test]
To access some parts/slices of your dataframe, you should use the iloc
property. This should solve your problem:
X_train = X.iloc[train]
y_train = Y.iloc[train]
X_test = X.iloc[test]
y_test = Y.iloc[test]