rclassificationknnpredictive

how to use the PCs (resulting from PCA) on my dataset in R?


I am an R learner. I am working on 'Human Activity Recognition' dataset from internet. It has 563 variables, the last variable being the class variable 'Activity' which has to be predicted.

I am trying to use KNN algorithm here from CARET package of R.

I have created another dataset with 561 numeric variables excluding the last 2 - subject and activity.

I ran the PCA on that and decided that I will use the top 20 PCs.

pca1 <- prcomp(human2, scale = TRUE)

I saved the data of those PCs in another dataset called 'newdat'

newdat <- pca1$x[ ,1:20]

Now I am tryig to run the below code : but it gives me error because, this newdat doesn't have my class variable

trctrl <- trainControl(method = "repeatedcv", number = 10, repeats = 3)
set.seed(3333)
knn_fit <- train(Activity ~., data = newdat, method = "knn",
                 trControl=trctrl,
                 preProcess = c("center", "scale"),
                 tuneLength = 10)

I tried to extract the last column 'activity' from the raw data and appending it using cbind() with 'newdat' to use that on knn-fit (above) but its not getting appended.

any suggestions how to use the PCs ?


Below is the code:

human1 <- read.csv("C:/NIIT/Term 2/Prog for Analytics II/human-activity-recognition-with-smartphones (1)/train1.csv", header = TRUE)
humant <- read.csv("C:/NIIT/Term 2/Prog for Analytics II/human-activity-recognition-with-smartphones (1)/test1.csv", header = TRUE)

#taking the predictor columns
human2 <- human1[ ,1:561]


pca1 <- prcomp(human2, scale = TRUE)
newdat <- pca1$x[ ,1:15]
newdat <- cbind(newdat, Activity = as.character(human1$Activity))

pca1 <- preProcess(human1[,1:561], 
                   method=c("BoxCox", "center", 
                            "scale", "pca"))
PC = predict(pca1, human1[,1:561])


trctrl <- trainControl(method = "repeatedcv", number = 10, repeats = 3)
set.seed(3333)
knn_fit <- train(Activity ~., data = newdat, method = "knn",
                 trControl=trctrl,
                 preProcess = c("center", "scale"),
                 tuneLength = 10)

#applying knn_fit to test data

test_pred <- predict(knn_fit, newdata = testing)
test_pred

#checking the prediction
confusionMatrix(test_pred, testing$V1 )

I am running into error in the below part. I have attached with the error:

> knn_fit <- train(Activity ~., data = newdat, method = "knn",
+                  trControl=trctrl,
+                  preProcess = c("center", "scale"),
+                  tuneLength = 10)
Error: cannot allocate vector of size 1.3 Gb

Solution

  • How have you tried to cbind the column, could you please show the code? I think you simply stepped into the difficulties produced by StringsAsFactors = TRUE. Does the following line solve your problem:

    #...
    #newdat <- pca1$x[ ,1:20]    
    newdat <- cbind(newdat, Activity = as.character(human2$Activity))