I would like to make a plot of the individuals for PLS-DA with the caret package in R (similar to PCA plot) and add a color for different groups (see picture attached, this is an example for PCA but I would like the same kind of graph for PLS-DA). Can someone help me with this?
Here is a code to generate random data similar to the data I used. Ycalib contains a vector variable with 2 levels, Xcalib contains 539 spectral wavelengths (below the code generates 10 wavelengths).
set.seed(1001)
Ycalib <- data.frame(
y = sample(c("0", "1"), 10, replace = TRUE)
)
set.seed(1001)
Xcalib <- data.frame(
x1 = sample(1:10),
x2 = sample(1:10),
x3 = sample(1:10),
x4 = sample(1:10),
x5 = sample(1:10),
x6 = sample(1:10),
x7 = sample(1:10),
x8 = sample(1:10),
x9 = sample(1:10),
x10= sample(1:10)
)
Here is my code for PLS-DA in caret:
library(caret)
set.seed(1001)
ctrl<-trainControl(method="repeatedcv",number=10,classProbs = TRUE,summaryFunction = twoClassSummary)
plsda<-train(x=Xcalib, # spectral data
y=Ycalib, # factor vector
method="pls", # pls-da algorithm
tuneLength=10, # number of components
trControl=ctrl, # ctrl contained cross-validation option
preProc=c("center","scale"), # the data are centered and scaled
metric="ROC") # metric is ROC for 2 classes
plsda
I hope it is clear enough as I am a beginner with R.
The object you need to extract from the train
model is called finalModel
. Using your object names above, it is extracted as follows:
dfscores <- as.data.frame(plsda$finalModel$scores)
Then, you can bind this to your original data, and you can plot as you wish, e.g., using ggplot2
.
p <- ggplot(dfscores, aes(x=`Comp 1`,y=`Comp 2`, color = Ycalib)) + geom_point()