rpredictionlmemmeans

Is it possible to specify unique combinations of continuous predictor variables using emmeans() with a single call?


I am using emmeans() in R to estimate marginal means after fitting a linear model with a single knot. I know we can be explicit about at what values the predictor should be set to using at = list(x = c()). I need to create my model by defining a new variable for the knot, which makes creating this at = more complicated.

Reproducible Example:

#
library(dplyr)
library(emmeans)
set.seed(080723)

# Simulate data
n <- 1000
df <- data.frame(x = runif(n, 0, 30))
df <- df%>%
  rowwise()%>%
  mutate(y = case_when(
  x <= 15 ~ (2*x) + rnorm(1, 0, 5),
  x > 15 ~ (2*x) + (5*(x-15)) + rnorm(1, 0, 5)
  ))%>%
  ungroup()

# Plot the simulated data
plot(df$x, df$y, pch = 16, col = "blue", xlab = "x", ylab = "y")

# Manually add a covariate to act as a spline in regression
df <- df%>%mutate(xk_15 = case_when(
  x <= 15 ~ 0,
  x > 15 ~ x-15
))

# LM using that covariate
m1 <- lm(y ~ x + xk_15, data = df)
summary(m1)

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -0.53337    0.39976  -1.334    0.182    
x            2.01647    0.03994  50.493   <2e-16 ***
xk_15        4.99538    0.07298  68.452   <2e-16 ***

Now, suppose I would like the estimated marginal mean for y when x = 5, 10, 15, and 20. However, given my model formulation I also need to specify xk_15. To match the desired values of x, the values for xk_15 should be 0, 0, 0, 5, respectively.

I can accomplish this by running emmeans() twice.

First,

emmeans(m1, "x", at = list(x=c(5,10,15), xk_15=c(0)))

x emmean    SE  df lower.CL upper.CL
  5   9.55 0.248 997     9.06     10.0
 10  19.63 0.206 997    19.23     20.0
 15  29.71 0.322 997    29.08     30.3

and then,

emmeans(m1, "x", at = list(x=c(20), xk_15=c(5)))

 x emmean    SE  df lower.CL upper.CL
 20   64.8 0.203 997     64.4     65.2

My question is: Is there a way to accomplish this in a single call? Conceptually, something like emmeans(m1, "x" , at = list(x=c(5,10,15,20),xk_15=c(0,0,0,5))), where it would match the elements of each at = vector. Note, that code does not work.

I present the current spline setup as it is the driving force for me, but I can imagine this could be generalized for any two continuous predictor variables using emmeans(). Of course I can get the desired estimated marginal means calling emmeans() twice as shown, but it would be nice to be able to specify specific combinations of continuous predictors in a single line, especially if there were many combinations of interest.


Solution

  • You can call emmeans a single time using both variables and filter out the rows you don't want:

    emmeans(m1, specs = c("x", "xk_15"), 
            at = list(x = c(5, 10, 15, 20), xk_15 = c(0, 5)))
      as_tibble() %>%
      filter((x < 20 & xk_15 == 0) | (x == 20 & xk_15 == 5))
    #> # A tibble: 4 x 7
    #>       x xk_15 emmean    SE    df lower.CL upper.CL
    #>   <dbl> <dbl>  <dbl> <dbl> <dbl>    <dbl>    <dbl>
    #> 1     5     0   10.0 0.256   997     9.51     10.5
    #> 2    10     0   19.9 0.207   997    19.5      20.3
    #> 3    15     0   29.7 0.324   997    29.1      30.3
    #> 4    20     5   64.8 0.209   997    64.4      65.2
    

    Though if this is something you want to do frequently with an lm, it's straightforward to get the same result without emmeans, for example the following function:

    emmeans2 <- function(obj, data) {
      data %>%
        cbind(predict(object = obj, newdata = ., se.fit = TRUE)[1:3]) %>%
        mutate(lower.CL = fit - 1.96 * se.fit, upper.CL = fit + 1.96 * se.fit) %>%
        rename(emmean = fit, SE = se.fit)
    }
      
    emmeans2(m1, data.frame(x=c(5, 10, 15, 20), xk_15=c(0, 0, 0, 5)))
    #>    x xk_15   emmean        SE  df  lower.CL upper.CL
    #> 1  5     0 10.01255 0.2557274 997  9.511327 10.51378
    #> 2 10     0 19.86030 0.2067146 997 19.455142 20.26546
    #> 3 15     0 29.70805 0.3240371 997 29.072940 30.34317
    #> 4 20     5 64.78476 0.2085883 997 64.375931 65.19360
    

    Of course, this only does a tiny fraction of what emmeans can do, but it works perfectly well in simple cases.