rregressionbest-fit-curvevariable-selection

Stepwise AIC using forward selection in R


I am trying to do a forward variable selection using stepwise AIC in R but I don't think that I am getting the desired results. Specifically, the function should start with no variables and keep adding variables and get their AIC values. However, when I run this I only get an AIC value for all variables. Where am I going wrong? here is my code-


model.full <- lm(distance ~ ., data = FAA_unique_without_speed_air)
model.null<-lm(distance ~ 1,  data = FAA_unique_without_speed_air)
modAIC <- MASS::stepAIC(model.full,direction='forward', scope=model.full, k = 2)

output -

Start:  AIC=9161.49
distance ~ aircraft + duration + no_pasg + speed_ground + height + 
    pitch

Solution

  • I think it would be best to be explicit with the arguments of stepAIC, rather than use the defaults. Try:

    1. Provide the null model as the initial model object when you want to do forward selection.

    2. Provide both a lower and upper search formula in the scope.

    For example, using the iris dataframe from the base library datasets:

    library(MASS)
    
    model.full <- lm(Sepal.Width ~ ., data = iris)
    model.null <- lm(Sepal.Width ~ 1, data = iris)
    
    MASS::stepAIC(model.null, direction = "forward", scope = list(lower = model.null,
                                                                  upper = model.full))
    

    Or if you want to use the defaults then you should be explicit about the default upper components included in the model:

    stepAIC(model.null, direction = "forward", scope = ~ Sepal.Length + Species + Petal.Length)
    

    However, as mentioned by @BenBolker you should post a reproducible example with your data so we can confirm.