rmachine-learningtidymodelsr-recipesr-parsnip

Feature elimination to screen for multiple models using tidymodels


I am currently performing regression modeling, with a dataset that has number of features (p) higher than observations (n). Typically p = 10000 and n = 30. Furthermore, I'd like to test many models and find the best one.

What I'm doing now is first to eliminate those features. Reducing it from 10K to 20-30, using step_select_mrmr() or step_select_vip(). I achieved that by placing it at the top of my pipeline. Then I would proceed with testing many models.

Is this approach reasonable?


Solution

  • It is reasonable as long as you are using resampling or a validation set to make sure that there is no information leakage.

    We hope to have more recipe functions for supervised filters later this year but Steven's are great.