r-caretshapefiler-sflasso-regressionvariable-selection

How to use CAST package for a shapefile (polygons) in R?


any help with the following is really appreciated!!

My goal: I need to run a lasso model for variable selection for my data (which is in sf polygon format).

My data: As said above, is a sf object. Specifically, is a shapefile with polygons.

I have tried using either ffs or train. But none of them work. Here is a reproducible example, with a multipolygon shapefile.

Please forget about the possible time relationship between the variables that end in "74" and the ones that end in "79".

library(sf)
library(CAST)

#Loading data
nc <- st_read(system.file("shape/nc.shp", package="sf"))

#Training and test data
set.seed(100)
ind   <- sample(2,nrow(nc),replace=T,prob = c(0.7,0.3))
train <- nc[ind==1,]
test  <- nc[ind==2,]

predictors <- c("SID74","BIR79","BIR74")
response   <- "NWBIR79"
## 1st option ##
#==============#
#ffs Forward feature selection
set.seed(10)
ffs(train[,predictors], train$NWBIR79,method = "lasso")
[1] "model using SID74,BIR79 will be trained now..."
Something is wrong; all the RMSE metric values are missing:
      RMSE        Rsquared        MAE     
 Min.   : NA   Min.   : NA   Min.   : NA  
 1st Qu.: NA   1st Qu.: NA   1st Qu.: NA  
 Median : NA   Median : NA   Median : NA  
 Mean   :NaN   Mean   :NaN   Mean   :NaN  
 3rd Qu.: NA   3rd Qu.: NA   3rd Qu.: NA  
 Max.   : NA   Max.   : NA   Max.   : NA  
 NA's   :3     NA's   :3     NA's   :3    
Error: Stopping
In addition: There were 26 warnings (use warnings() to see them)
## 2nd option ##
#==============#
#model without ffs
set.seed(100)
model <- train(train[,predictors], train$NWBIR79, method="lasso", trControl=trainControl(method = "cv"),importance=T)
Something is wrong; all the RMSE metric values are missing:
      RMSE        Rsquared        MAE     
 Min.   : NA   Min.   : NA   Min.   : NA  
 1st Qu.: NA   1st Qu.: NA   1st Qu.: NA  
 Median : NA   Median : NA   Median : NA  
 Mean   :NaN   Mean   :NaN   Mean   :NaN  
 3rd Qu.: NA   3rd Qu.: NA   3rd Qu.: NA  
 Max.   : NA   Max.   : NA   Max.   : NA  
 NA's   :3     NA's   :3     NA's   :3    
Error: Stopping
In addition: There were 11 warnings (use warnings() to see them)

Solution

  • Two things that lead to the error:

    1. The geometries are currently part of the predictors. Drop geometries: st_drop_geometry(train[,predictors])

    2. "importance" is not a parameter of the lasso method

    model <- train(st_drop_geometry(train[,predictors]), train$NWBIR79, method="lasso", trControl=trainControl(method = "cv"))

    This should work with CAST::ffs in the same way.