rr-formula

Condition ( | ) in R formula


I found this pdf on R formulas and I am not able to figure out how the | works (see the table on the second page). Furthermore, I could not find any explanation on the web. It appears from time to time in lists for possible formula symbols but without any example.

I think that it might be out of date because of other ways to achieve whatever it did.

Does anybody know how to use | in a formula and what it exactly achieves?

A bit of code with shows my clumsy attempt to use |.

x <- rnorm(100)
y <- rnorm(100)
z <- sample(c(TRUE, FALSE), 100, replace = TRUE )

lm(y ~ x|z)

Solution

  • The symbol | means different things depending on the context:

    The general case

    In general, | means OR. General modeling functions will see any | as a logic operator and carry it out. This is the equivalent of using another operator, eg ^ as in:

    lm(y~ x + x^2)
    

    The operator is carried out first, and this new variable is then used to construct the model matrix and do the fitting.

    In your code, | also means OR. You have to keep in mind that R interpretes numeric values also as logical when you use any logical operator. A 0 is seen as FALSE, anything else as TRUE.

    So your call to lm constructs a model of y in function of x OR z. This doesn't make any sense. Given the values of x, this will just be y ~ TRUE. This is also the reason your model doesn't fit. Your model matrix has 2 columns with 1's, one for the intercept and one for the only value in x|z, being TRUE. Hence your coefficient for x|z can't even be calculated, as shown from the output:

    > lm(y ~ x|z)
    
    Call:
    lm(formula = y ~ x | z)
    
    Coefficients:
    (Intercept)    x | zTRUE  
       -0.01925           NA  
    

    Inside formulas for mixed models

    In mixed models (eg lme4 package), | is used to indicate a random effect. A term like + 1|X means: "fit a random intercept for every category in X". You can translate the | as "given". So you can see the term as "fit an intercept, given X". If you keep this in mind, the use of | in specifications of correlation structures in eg the nlme or mgcv will make more sense to you.

    You still have to be careful, as the exact way | is interpreted depends largely on the package you use. So the only way to really know what it means in the context of the modeling function you use, is to check that in the manual of that package.

    Other uses

    There are some other functions and packages that use the | symbol in a formula interface. Also here it pretty much boils down to indicating some kind of group. One example is the use of | in the lattice graphic system. There it is used for faceting, as shown by the following code:

    library(lattice)
    densityplot(~Sepal.Width|Species,
                data = iris,
                main="Density Plot by Species",
                xlab="Sepal width")