I am using emmeans() in R to estimate marginal means after fitting a linear model with a single knot. I know we can be explicit about at what values the predictor should be set to using at = list(x = c())
. I need to create my model by defining a new variable for the knot, which makes creating this at =
more complicated.
Reproducible Example:
#
library(dplyr)
library(emmeans)
set.seed(080723)
# Simulate data
n <- 1000
df <- data.frame(x = runif(n, 0, 30))
df <- df%>%
rowwise()%>%
mutate(y = case_when(
x <= 15 ~ (2*x) + rnorm(1, 0, 5),
x > 15 ~ (2*x) + (5*(x-15)) + rnorm(1, 0, 5)
))%>%
ungroup()
# Plot the simulated data
plot(df$x, df$y, pch = 16, col = "blue", xlab = "x", ylab = "y")
# Manually add a covariate to act as a spline in regression
df <- df%>%mutate(xk_15 = case_when(
x <= 15 ~ 0,
x > 15 ~ x-15
))
# LM using that covariate
m1 <- lm(y ~ x + xk_15, data = df)
summary(m1)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.53337 0.39976 -1.334 0.182
x 2.01647 0.03994 50.493 <2e-16 ***
xk_15 4.99538 0.07298 68.452 <2e-16 ***
Now, suppose I would like the estimated marginal mean for y when x = 5, 10, 15, and 20. However, given my model formulation I also need to specify xk_15. To match the desired values of x, the values for xk_15 should be 0, 0, 0, 5, respectively.
I can accomplish this by running emmeans() twice.
First,
emmeans(m1, "x", at = list(x=c(5,10,15), xk_15=c(0)))
x emmean SE df lower.CL upper.CL
5 9.55 0.248 997 9.06 10.0
10 19.63 0.206 997 19.23 20.0
15 29.71 0.322 997 29.08 30.3
and then,
emmeans(m1, "x", at = list(x=c(20), xk_15=c(5)))
x emmean SE df lower.CL upper.CL
20 64.8 0.203 997 64.4 65.2
My question is: Is there a way to accomplish this in a single call?
Conceptually, something like emmeans(m1, "x" , at = list(x=c(5,10,15,20),xk_15=c(0,0,0,5)))
, where it would match the elements of each at =
vector. Note, that code does not work.
I present the current spline setup as it is the driving force for me, but I can imagine this could be generalized for any two continuous predictor variables using emmeans(). Of course I can get the desired estimated marginal means calling emmeans() twice as shown, but it would be nice to be able to specify specific combinations of continuous predictors in a single line, especially if there were many combinations of interest.
You can call emmeans
a single time using both variables and filter out the rows you don't want:
emmeans(m1, specs = c("x", "xk_15"),
at = list(x = c(5, 10, 15, 20), xk_15 = c(0, 5)))
as_tibble() %>%
filter((x < 20 & xk_15 == 0) | (x == 20 & xk_15 == 5))
#> # A tibble: 4 x 7
#> x xk_15 emmean SE df lower.CL upper.CL
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 5 0 10.0 0.256 997 9.51 10.5
#> 2 10 0 19.9 0.207 997 19.5 20.3
#> 3 15 0 29.7 0.324 997 29.1 30.3
#> 4 20 5 64.8 0.209 997 64.4 65.2
Though if this is something you want to do frequently with an lm
, it's straightforward to get the same result without emmeans
, for example the following function:
emmeans2 <- function(obj, data) {
data %>%
cbind(predict(object = obj, newdata = ., se.fit = TRUE)[1:3]) %>%
mutate(lower.CL = fit - 1.96 * se.fit, upper.CL = fit + 1.96 * se.fit) %>%
rename(emmean = fit, SE = se.fit)
}
emmeans2(m1, data.frame(x=c(5, 10, 15, 20), xk_15=c(0, 0, 0, 5)))
#> x xk_15 emmean SE df lower.CL upper.CL
#> 1 5 0 10.01255 0.2557274 997 9.511327 10.51378
#> 2 10 0 19.86030 0.2067146 997 19.455142 20.26546
#> 3 15 0 29.70805 0.3240371 997 29.072940 30.34317
#> 4 20 5 64.78476 0.2085883 997 64.375931 65.19360
Of course, this only does a tiny fraction of what emmeans
can do, but it works perfectly well in simple cases.