I am an R learner. I am working on 'Human Activity Recognition' dataset from internet. It has 563 variables, the last variable being the class variable 'Activity' which has to be predicted.
I am trying to use KNN algorithm here from CARET package of R.
I have created another dataset with 561 numeric variables excluding the last 2 - subject and activity.
I ran the PCA on that and decided that I will use the top 20 PCs.
pca1 <- prcomp(human2, scale = TRUE)
I saved the data of those PCs in another dataset called 'newdat'
newdat <- pca1$x[ ,1:20]
Now I am tryig to run the below code : but it gives me error because, this newdat doesn't have my class variable
trctrl <- trainControl(method = "repeatedcv", number = 10, repeats = 3)
set.seed(3333)
knn_fit <- train(Activity ~., data = newdat, method = "knn",
trControl=trctrl,
preProcess = c("center", "scale"),
tuneLength = 10)
I tried to extract the last column 'activity' from the raw data and appending it using cbind() with 'newdat' to use that on knn-fit (above) but its not getting appended.
any suggestions how to use the PCs ?
Below is the code:
human1 <- read.csv("C:/NIIT/Term 2/Prog for Analytics II/human-activity-recognition-with-smartphones (1)/train1.csv", header = TRUE)
humant <- read.csv("C:/NIIT/Term 2/Prog for Analytics II/human-activity-recognition-with-smartphones (1)/test1.csv", header = TRUE)
#taking the predictor columns
human2 <- human1[ ,1:561]
pca1 <- prcomp(human2, scale = TRUE)
newdat <- pca1$x[ ,1:15]
newdat <- cbind(newdat, Activity = as.character(human1$Activity))
pca1 <- preProcess(human1[,1:561],
method=c("BoxCox", "center",
"scale", "pca"))
PC = predict(pca1, human1[,1:561])
trctrl <- trainControl(method = "repeatedcv", number = 10, repeats = 3)
set.seed(3333)
knn_fit <- train(Activity ~., data = newdat, method = "knn",
trControl=trctrl,
preProcess = c("center", "scale"),
tuneLength = 10)
#applying knn_fit to test data
test_pred <- predict(knn_fit, newdata = testing)
test_pred
#checking the prediction
confusionMatrix(test_pred, testing$V1 )
I am running into error in the below part. I have attached with the error:
> knn_fit <- train(Activity ~., data = newdat, method = "knn",
+ trControl=trctrl,
+ preProcess = c("center", "scale"),
+ tuneLength = 10)
Error: cannot allocate vector of size 1.3 Gb
How have you tried to cbind the column, could you please show the code? I think you simply stepped into the difficulties produced by StringsAsFactors = TRUE
. Does the following line solve your problem:
#...
#newdat <- pca1$x[ ,1:20]
newdat <- cbind(newdat, Activity = as.character(human2$Activity))