I am interested in adding a random effect to a ggplot using stat_poly_eq().
library (ggplot2)
library(ggpmisc)
data (mtcars)
mtcars$gear <- factor (mtcars$gear)
mtcars$cyl <- factor (mtcars$cyl)
transmission <- data.frame ( code = c(1, 0),
type = c("manual", "automatic"))
mtcars$transmission <- setNames (transmission$type, transmission$code) [as.character (mtcars$am)]
my.formula <- y ~ x
p <- ggplot (data = mtcars, aes(x=hp, y=mpg, group = 1)) +
facet_wrap (~transmission, strip.position = "top", scales = "fixed", nrow = 1) +
ggtitle ("car horsepower (hp) to effiency (mpg) based on cylinder and number of gears for American vs others cars") +
geom_point ( size = 6, shape = 16, aes ( shape=cyl, color=cyl) ) +
scale_color_manual (name = "number of cylinder",
labels = c (paste (levels(mtcars$cyl), "cylinders")),
values = c("turquoise", "orange", "red")) +
geom_smooth (data = mtcars, method="lm",
formula = my.formula,
se = FALSE,
color = "blue",
fill = "blue") +
stat_poly_eq (formula = my.formula,
aes(label = paste(after_stat(eq.label), after_stat(adj.rr.label), after_stat(p.value.label), sep = "~~~")),
parse = TRUE,
size = 4,
col = "black") +
theme (panel.spacing = unit(0, "lines"),
strip.background = element_blank(),
plot.title = element_text(hjust = 0.5,
color = "black",
size = 14,
face = "bold.italic"),
strip.placement = "outside")
print(p)
I'd assume blocking and adding random effects would be the same as for any linear model (+ Block OR + (1 | random.effect). However, I am not getting the code to work. Does anyone have any good ideas?
Here is an example of what I am talking about using mtcars. In this case I'll ask you to imagine that the vehicle weight is a random effect (I understand that it is likely not random effect but the example dataset doesn't include a good random effect. Here is the code I use to add the random effect and the regression line fails.
my.formula <- y ~ x + (1 | mtcars$wt)
When you use method = "lm"
in geom_smooth
, you are using the function lm
to create a simple linear model on each subgroup of points in each panel. ggplot
will split the data frame into multiple smaller data frames to do this. You cannot pass an entire column of the original data to your formula, since it will contain all the rows of the data frame, and hence be a different length from all the other variables in the subgroup being analysed.
Furthermore, ggplot uses the predict
function to get predicted values for your regression. It uses the range of the x axis to generate the values at which to predict. If you use variables other than y
and x
in the formula to geom_smooth
, it will not know what values you want for these variables in the predict
function, and will simply not work. You can therefore only use y
and x
in the formula passed to geom_smooth
. This means it isn't suitable for mixed effect models, which by definition use more than one variable on the right hand side.
To make matters worse, the (1 | wt)
syntax does not create a random intercept mixed effects model inside lm
. If you want to create a mixed-effects model, you need to use a specifically-designed function like lmer
from the lme4
package.
For both the above reasons, stat_poly_eq
will not work in this scenario. It is designed only to work with lm
regressions.
However, all is not lost. It is still possible to get the plot you want using ggplot
as long as we work through the necessary steps.
First, let's create a data set that lends itself to a mixed effects model:
set.seed(1)
df <- data.frame(
x = rep(1:10, 5) + runif(50),
y = rep(1:10, 5) + rep(rnorm(5, 5, 5), each = 10) + rnorm(50),
z = rep(LETTERS[1:5], each = 10)
)
We can load lme4
and create our model with a random intercept
library(lme4)
#> Loading required package: Matrix
model <- lmer(y ~ x + (1|z), data = df)
model
#> Linear mixed model fit by REML ['lmerMod']
#> Formula: y ~ x + (1 | z)
#> Data: df
#> REML criterion at convergence: 161.1372
#> Random effects:
#> Groups Name Std.Dev.
#> z (Intercept) 3.3242
#> Residual 0.9568
#> Number of obs: 50, groups: z, 5
#> Fixed Effects:
#> (Intercept) x
#> 2.8896 0.9926
Now, let's say we want to plot the implied regression lines for each group of our data, as well as the overall fixed effect in ggplot. We can start by creating a data frame of x
and z
values for which we want to predict the y
values. We then feed this to predict
and store the result:
plot_df <- data.frame(x = rep(1:10, 5), z = rep(LETTERS[1:5], each = 10))
plot_df$y <- predict(model, newdata = plot_df)
From the comments it seems that you also want an R-squared and p values. Just be aware that these are not as straightforward to obtain or interpret for mixed models, and you should ensure you know what these mean when putting them on your plot. You can obtain them in this case as follows:
r2 <- round(performance::r2_nakagawa(model)$R2_conditional, 3)
pval <- scales::pvalue(car::Anova(model, test = "F")$`Pr(>F)`, add_p = TRUE)
All that remains for us to do now is plot our raw data, the regression lines, the overall fixed effect, and the equation for the fixed effect using a custom annotation extracted from our model:
library(ggplot2)
ggplot(df, aes(x, y, color = z)) +
geom_point(size = 3, alpha = 0.5) +
geom_line(data = plot_df, linetype = 2) +
geom_abline(aes(intercept = fixef(model)[1], slope = fixef(model)[2],
linetype = "Fixed effect"), key_glyph = draw_key_path,
linewidth = 1) +
scale_color_brewer(palette = "Set1") +
labs(linetype = NULL, color = "groups (z)") +
annotation_custom(grid::textGrob(
bquote(bold(~y ==
.(round(fixef(model)[1], 3)) + .(round(fixef(model)[2], 3)) * x) ~
~~~italic(paste(.(pval))~~~R^2 == .(r2))),
x = 0.05, y = 0.9, hjust = 0, gp = grid::gpar(cex = 1.2))) +
theme_minimal(base_size = 16) +
ggtitle("Illustration of mixed effects model with random intercept") +
theme(legend.position = "top",
plot.title = element_text(face = 2, hjust = 0.5),
panel.border = element_rect(fill = NA, linewidth = 0.2))