I am looking for help estimating the effect of a third order polynomial term -- over and above the linear term at different values of the independent variable x
. While I can estimate the point estimate, I am not sure how to calculate the confidence interval. I am thinking that marginal effects and the marginaleffects
package may be the answer here -- but this is relatively new to me am not sure how to put this together.
If you have a different package, formula, or code that seems like the right answer, I'm definitely interested in that too! The general application here is an unexpected event design / interrupted time series using social science survey data.
Here is a reproducible example. In this example, there is a running variable X
ranging between about x = -2.5
and x = 2.5
and the outcome variable Y
. There are two quantities of interest:
x = 0
(i.e. the jump)In this example, the outcome variable Y
is a stepwise function of X
. I am using a third order polynomial to approximate this function, which as the designer of the simulation we know, even though it is incorrect.
Here is a figure (produced by the code below) that visualizes the simulated data (blue dots), the original trajectory (dashed orange line), and the third order polynomial (black line).
... and here is the code
library(margins)
library(tidyverse)
set.seed(1900)
# Create Data
N <- 200
quad <- data.frame(x = rnorm(N)) %>%
mutate(post = as.integer(x > 0),
x_sq = x^2,
x_th = x^3)
# Create outcome (stepwise outcome with slight upward slope, jump, gradual decrease back to original slope)
quad$y <- with(quad, 0.002 * x + 0.05 * post - 0.05 * x * post * (x <= 1) - 0.05 * (x > 1))
# Visualize outcome
plot(quad$x, quad$y)
# Specify model (third order polynomial)
mod_quad <- lm(formula = y ~ x + x_sq + x_th + post + post * x + post * x_sq + post * x_th, data = quad)
# Plot Predictions
plot_predictions(mod_quad, by = "x", vcov = TRUE) +
geom_point(data = quad,
aes(x = x, y = y),
color = "blue",
alpha = 0.5) +
geom_line(data = tibble(x = seq(min(quad$x), max(quad$x), 0.01),
y = x * 0.002),
aes(x = x, y = y),
linetype = "dashed",
color = "darkorange") +
theme_bw()
Thank you for your help! Let me know if there are any clarifying questions.
You should not hardcode the polynomial terms. To compute the standard errors, marginaleffects
requires numerical derivatives which are obtained by slightly tweaking the value of x
. But when you hardcode the polynomial terms, marginaleffects
tweaks x
but only the x
column is affected, and not x_sq
, x_th
, etc.
Instead, use the poly()
function in the model fitting call, as suggested in comments:
library(dplyr)
library(ggplot2)
library(marginaleffects)
set.seed(1900)
N <- 200
quad <- data.frame(x = rnorm(N)) %>%
mutate(post = as.integer(x > 0),
x_sq = x^2,
x_th = x^3)
quad$y <- with(quad, 0.002 * x + 0.05 * post - 0.05 * x * post * (x <= 1) - 0.05 * (x > 1))
mod <- lm(formula = y ~ post * poly(x, 3), data = quad)
plot_predictions(mod, condition = "x") +
geom_point(data = quad, aes(x, y))