rpls

Why do PLS regression coefficients with R [pls] differ with those from other R packages?


Out of curiosity, I am trying to figure out why the PLS regression coefficients obtained with pls differ from the coefficients obtained with plsRglm, ropls, or plsdepot which all provide the same results.

Here is some code to start with. I have tried to play with the scale, center, and method arguments of the plsr function... but no success so far.

library(pls)
library(plsRglm)
library(ropls)
library(plsdepot)

data(Cornell)

pls.plsr <- plsr(
  Y~X1+X2+X3+X4+X5+X6+X7, 
  data = Cornell, 
  ncomp = 3, 
  scale = TRUE, 
  center = TRUE
)

plsRglm.plsr <- plsR(
  Y~X1+X2+X3+X4+X5+X6+X7, 
  data = Cornell, 
  nt = 3, 
  scaleX = TRUE
)

ropls.plsr <- opls(
  as.matrix(Cornell[, grep("X", colnames(Cornell))]),
  Cornell[, "Y"], 
  scaleC = "standard"
)

plsdepot.plsr <- plsreg1(
  as.matrix(Cornell[, grep("X", colnames(Cornell))]),
  Cornell[, "Y"],
  comps = 3
)

## extract PLS regression coefficients for the PLS model with three components
coef(pls.plsr) # a
coef(plsRglm.plsr, type = "original") # b
coef(plsRglm.plsr, type = "scaled") # c
coef(ropls.plsr) # c
plsdepot.plsr$std.coefs # c
plsdepot.plsr$reg.coefs # b

Solution

  • Firstly, just for re-formatting purposes, we write:

    library(pls)
    library(plsRglm)
    library(ropls)
    library(plsdepot)
    
    data(Cornell)
    pls.plsr <- plsr(Y ~ X1 + X2 + X3 + X4 + X5 + X6 + X7, 
                     data = Cornell, 
                     ncomp = 3, scale = T, center = T)
    plsRglm.plsr <- plsR(Y ~ X1 + X2 + X3 + X4 + X5 + X6 + X7, 
                        data = Cornell, 
                        nt = 3, scaleX = TRUE)
    ropls.plsr <- opls(as.matrix(Cornell[, grep("X", colnames(Cornell))]),
                       Cornell[, "Y"], scaleC = "standard")
    plsdepot.plsr <- plsreg1(as.matrix(Cornell[, grep("X", colnames(Cornell))]),
                             Cornell[, "Y"], comps = 3)
    

    That done, you may extract the coefficients in the original scale:

    ### ORIGINAL SCALE -  plsRglm, plsdepot
    coef(plsRglm.plsr, type = "original")
    plsdepot.plsr$reg.coefs
    

    Or you can have them scaled:

    ### SCALED - plsRglm, ropls, plsdepot
    coef(plsRglm.plsr, type = "scaled")
    coef(ropls.plsr)
    plsdepot.plsr$std.coefs
    

    Therefore all methods now give rise to the same coefficients... Except for pls::plsr. Why? You may ask. The key is in the command. When you run:

    coef(pls.plsr) # , , 3 comps
    

    You see ", , 3". That is characteristic of a tensor object. What is this? Coefficients should be simply a vector. The reason is that coef is a generic function and it is not working properly for pls::plsr models. To see what it is actually extracting:

    pls.plsr$coefficients
    matrix(pls.plsr$coefficients, ncol = 3) # or in matrix form. coef simply extracts the third column (it should not)
    

    But you can see the same fit for all models if you examine the equivalent object in each R-package as in:

    matrix(pls.plsr$projection, ncol = 3)    
    plsRglm.plsr$wwetoile
    plsdepot.plsr$mod.wgs
    ropls.plsr@weightStarMN
    

    Therefore, for pls::plsr you were simply not extracting the coefficients.