rformula

Meaning of "." in model formula vs how it is documented in ?formula


In ?formula it says:

There are two special interpretations of . in a formula. The usual one is in the context of a data argument of model fitting functions and means ‘all columns not otherwise in the formula’: see terms.formula. In the context of update.formula, only, it means ‘what was previously in this part of the formula’.

My reading of the first part of that bit of documentation would have led me to assume that this code:

dat <- data.frame(
  y = rnorm(10),
  x1 = runif(10),
  x2 = rbinom(10,size = 1,prob = 0.5),
  x3 = rbinom(10,size = 1,prob = 0.5)
)

mt <- terms.formula(
  x = y ~ x1 + . + (.)^2,
  data = dat
)

mm <- model.matrix(mt,dat)

...would yield a model matrix with an interaction term only for x2:x3, as those are the only two columns in dat "not otherwise in the formula". However:

> colnames(mm)
[1] "(Intercept)" "x1"          "x2"          "x3"          "x1:x2"      
[6] "x1:x3"       "x2:x3"  

...instead we get all the interactions.

If I write it out explicitly, of course, I get what I expect:

> mt1 <- terms.formula(
+   x = y ~ x1 + x2 + x3 + (x2 + x3)^2,
+   data = dat
+ )
> 
> mm1 <- model.matrix(mt1,dat)
> colnames(mm1)
[1] "(Intercept)" "x1"          "x2"          "x3"          "x2:x3"  

I know that formulas & model matrices are sometimes subtly confusing, but I'm having a hard time reconciling my reading of the documentation with the behavior.

Am I interpreting the documentation incorrectly, or possibly writing the formula incorrectly (for what I'm trying to do)? Or is the documentation not entirely accurate?


Solution

  • It looks like "not otherwise in the formula" might really mean "not on the left-hand side of the formula": e.g.

    terms.formula( y + x1 ~ x1 + .^2, data = dat)
    

    (while silly) doesn't include x1 in the interactions. On the other hand, including offset(x1) doesn't count.

    The internal code for terms is scary, but this comment

    /* If there is a dotsxp being expanded then we need to see whether any of the variables in the data frame match with the variable on the lhs. If so they shouldn't be included in the factors */

    (emphasis added) strengthens the conclusion.

    For what it's worth, this also works to exclude x1 from the interaction:

    terms.formula( y ~ x1 + (.-x1)^2 , data = dat)