any help with the following is really appreciated!!
My goal: I need to run a lasso model for variable selection for my data (which is in sf polygon format).
My data: As said above, is a sf object. Specifically, is a shapefile with polygons.
I have tried using either ffs
or train
. But none of them work.
Here is a reproducible example, with a multipolygon shapefile.
Please forget about the possible time relationship between the variables that end in "74" and the ones that end in "79".
library(sf)
library(CAST)
#Loading data
nc <- st_read(system.file("shape/nc.shp", package="sf"))
#Training and test data
set.seed(100)
ind <- sample(2,nrow(nc),replace=T,prob = c(0.7,0.3))
train <- nc[ind==1,]
test <- nc[ind==2,]
predictors <- c("SID74","BIR79","BIR74")
response <- "NWBIR79"
## 1st option ##
#==============#
#ffs Forward feature selection
set.seed(10)
ffs(train[,predictors], train$NWBIR79,method = "lasso")
[1] "model using SID74,BIR79 will be trained now..."
Something is wrong; all the RMSE metric values are missing:
RMSE Rsquared MAE
Min. : NA Min. : NA Min. : NA
1st Qu.: NA 1st Qu.: NA 1st Qu.: NA
Median : NA Median : NA Median : NA
Mean :NaN Mean :NaN Mean :NaN
3rd Qu.: NA 3rd Qu.: NA 3rd Qu.: NA
Max. : NA Max. : NA Max. : NA
NA's :3 NA's :3 NA's :3
Error: Stopping
In addition: There were 26 warnings (use warnings() to see them)
## 2nd option ##
#==============#
#model without ffs
set.seed(100)
model <- train(train[,predictors], train$NWBIR79, method="lasso", trControl=trainControl(method = "cv"),importance=T)
Something is wrong; all the RMSE metric values are missing:
RMSE Rsquared MAE
Min. : NA Min. : NA Min. : NA
1st Qu.: NA 1st Qu.: NA 1st Qu.: NA
Median : NA Median : NA Median : NA
Mean :NaN Mean :NaN Mean :NaN
3rd Qu.: NA 3rd Qu.: NA 3rd Qu.: NA
Max. : NA Max. : NA Max. : NA
NA's :3 NA's :3 NA's :3
Error: Stopping
In addition: There were 11 warnings (use warnings() to see them)
Two things that lead to the error:
The geometries are currently part of the predictors. Drop geometries: st_drop_geometry(train[,predictors])
"importance" is not a parameter of the lasso method
model <- train(st_drop_geometry(train[,predictors]), train$NWBIR79, method="lasso", trControl=trainControl(method = "cv"))
This should work with CAST::ffs
in the same way.