rlme4r-formula

What does the ( | ) syntax mean in an R formula?


I am following a tutorial and came across the following syntax:

# assume 'S' is the name of the subjects column
# assume 'X1' is the name of the first factor column
# assume 'X2' is the name of the second factor column
# assume 'X3' is the name of the third factor column
# assume 'Y' is the name of the response column
# run the ART procedure on 'df'

# linear mixed model syntax; see lme4::lmer
m = art(Y ~ X1 * X2 * X3 + (1|S), data=df) 

anova(m)

I am a bit confused by the (|) syntax. I looked at the documentation for the linear mixed model syntax lmer, and found: "Random-effects terms are distinguished by vertical bars (|) separating expressions for design matrices from grouping factors".

So I assume 1 and S here are two random effects terms. S makes sense as a random effect since it is a random variable that could stand for participant. But how is 1 a random variable? What does the 1 and | mean here?


Solution

  • The | symbol is used in formulas in different ways in different functions. In the case of linear mixed models, its used to denote random effects. There are different types of random effects that can be used in mixed models:

    The 1 in the formula is used to specify which one of these to use. Here are some examples, taken from my book:

    library(lme4)
    # Random intercept:
    m1 <- lmer(Reaction ~ Days + (1|Subject), data = sleepstudy)
    
    # Random slope:
    m2 <- lmer(Reaction ~ Days + (0 + Days|Subject), data = sleepstudy)
    
    # Correlated random intercept and slope:
    m3 <- lmer(Reaction ~ Days + (1 + Days|Subject), data = sleepstudy)
    
    # Uncorrelated random intercept and slope:
    m4 <- lmer(Reaction ~ Days + (1|Subject) + (0 + Days|Subject),
               data = sleepstudy)
    

    So in your example, (1|S) is used to add a random intercept, corresponding to different values of S.

    A similar but notationally different use of | can be found in formulas for lmtree from partykit, which is used to fit decision trees with linear models in the node. In that case, the formula looks like y ~ x1 + x2 | z1 + z2 + z3, where y is the response variable, the x variables are the explanatory variables in the linear models and the z variables are the variables used for building the tree.