rr-formula

short formula call for many variables when building a model


I am trying to build a regression model with lm(...). My dataset has lots of features( >50). I do not want to write my code as:

lm(output ~ feature1 + feature2 + feature3 + ... + feature70)

I was wondering what is the short hand notation to write this code?


Solution

  • You can use . as described in the help page for formula. The . stands for "all columns not otherwise in the formula".

    lm(output ~ ., data = myData).

    Alternatively, construct the formula manually with paste. This example is from the as.formula() help page:

    xnam <- paste("x", 1:25, sep="")
    (fmla <- as.formula(paste("y ~ ", paste(xnam, collapse= "+"))))
    

    You can then insert this object into regression function: lm(fmla, data = myData).