rpls

PLS in R: Extracting PRESS statistic values


I'm relatively new to R and am currently in the process of constructing a PLS model using the pls package. I have two independent datasets of equal size, the first is used here for calibrating the model. The dataset comprises of multiple response variables (y) and 101 explanatory variables (x), for 28 observations. The response variables, however, will each be included seperately in a PLS model. The code current looks as follows:

# load data
data <- read.table("....txt", header=TRUE)
data <- as.data.frame(data)

# define response variables (y)
HEIGHT <- as.numeric(unlist(data[2]))
FBM <- as.numeric(unlist(data[3]))
N <- as.numeric(unlist(data[4]))
C <- as.numeric(unlist(data[5]))
CHL <- as.numeric(unlist(data[6]))

# generate matrix containing the explanatory (x) variables only
spectra <-(data[8:ncol(data)])

# calibrate PLS model using LOO and 20 components
library(pls)
refl.pls <- plsr(N ~ as.matrix(spectra), ncomp=20, validation = "LOO", jackknife = TRUE)

# visualize RMSEP -vs- number of components
plot(RMSEP(refl.pls), legendpos = "topright")

# calculate explained variance for x & y variables
summary(refl.pls) 

I have currently arrived at the point at which I need to decide, for each response variable, the optimal number of components to include in my PLS model. The RMSEP values already provide a decent indication. However, I would also like to base my decision on the PRESS (Predicted Residual Sum of Squares) statistic, in accordance various studies comparable to the one I am conducting. So in short, I would like to extract the PRESS statistic for each PLS model with n components.

I have browsed through the pls package documentation and across the web, but unfortunately have been unable to find an answer. If there is anyone out here that could help me get in the right direction that would be greatly appreciated!


Solution

  • You can find the PRESS values in the mvr object.

    refl.pls$validation$PRESS
    

    You can see this either by exploring the object directly with str or by perusing the documentation more thoroughly. You will notice if you look at ?mvr you will see the following:

    validation  if validation was requested, the results of the 
                cross-validation. See mvrCv for details.
    

    Validation was indeed requested so we follow this to ?mvrCv where you will find:

    PRESS       a matrix of PRESS values for models with 1, ..., 
                ncomp components. Each row corresponds to one response variable.