I am trying to fit 5 fold cross validation for several values of k. I used the OJ data set in ISLR package.
my code so far as follows,
library(ISLR)
library(class)
ks=c(1:5)
err.rate.test <- numeric(length = 5)
folds <- cut(seq(1,nrow(OJ)),breaks=5,labels=FALSE)
for (j in seq(along = ks)) {
set.seed(123)
cv.knn <- sapply(1:5, FUN = function(i) {
testID <- which(folds == i, arr.ind = TRUE)
test.X <- OJ[testID, 3]
test.Y <- OJ[testID, 1]
train.X <- OJ[-testID, 3]
train.Y <- OJ[-testID, 1]
knn.test <- knn(data.frame(train.X), data.frame(test.X), train.Y, k = ks[j])
cv.test.est <- mean(knn.test != test.Y)
return(cv.test.est)
})
err.rate.test[j] <- mean(cv.knn)
}
err.rate.test
[1] 0.3757009 0.3757009 0.3757009 0.3757009 0.3757009
The code doesn't give any errors. But for some reason, my test error rate for each value of k is same. This seems to be weird for me. So i assume there is something wrong with my code.
Can anyone help me to figure that out?
remove set.seed(123)
, this causes the repeat error rates.
set.seed
is used for reproducibility, ensuring that any random grid searches or parameter estimates remain constant, meaning all of the parameter estimates that go into fitting the knn
model will be the same across executions, resulting in the same predictions and therefore the same error rates.