rfeature-selectionsurvival-analysiscox-regression

How do I train a penalized CoxPH model with restricted cubic splines in R?


I am trying to train a time-independent Cox model on a dataset of ~750,000 rows, as well as a time-dependent one on several million rows. I have 19 variables, some of which are binary and some continuous. I have been modelling the continuous variables with restricted cubic splines (rcs function, pspline was giving me numerical issues for some reason) as an easy way to deal with non-linearity; overfitting hasn't been a problem from comparing the concordance index on a test set. However, I want to do variable selection on my models. Preferably (LASSO) regularization, as I think best subset selection will likely be too computationally heavy. The survival package in R has a ridge function, but this needs to be applied on separate variables and I'd like to do it over the whole model. There's also cv.glmnet with family="cox", but this doesn't allow me to use splines (I believe). Is there a nice way to do this?


Solution

  • The penalized package for R does Cox PH regressions with L1 (lasso) and L2 (ridge) penalties. It appears to work with bs from the splines package (but I have not tested it with rcs or other splines). I did fit a model using pslines in the formula and it gave an answer (and and optimum L1 penalty), but I don't know how this compares to the regular penalized spline approach in survival.