Imagine I have two separate lm
objects
data(mtcars)
lm1 <- lm(mpg ~ wt, data = mtcars)
lm2 <- lm(mpg ~ wt + disp, data = mtcars)
In this case I'd like to compare both wt
coefficients, and perform a hypothesis test on the null that the coefficients in both models are equal(for technical reason I need to actually have two models, rather than just including an interaction)
Since you want to perform a hypothesis test on the estimates, I suggest a fully Bayesian model, which will get you the full posterior distribution of every variable.
rstanarm
is based on Stan
, and offers convenient functions that mimic the usual lm
, glm
syntax; if you want to know more about Stan
/RStan
, see here.
Based on the posterior distributions of every variable, we can then perform e.g. a t test and Kolmogorov-Smirnov test to compare the full posterior densities for every variable.
Here is what you could do:
Perform the model fits.
library(rstanarm);
lm1 <- stan_lm(mpg ~ wt, data = mtcars, prior = NULL);
lm2 <- stan_lm(mpg ~ wt + disp, data = mtcars, prior = NULL);
Note how easy it is to run a fully Bayesian linear model with rstanarm
.
Extract the posterior densities for all shared coefficients (in this case, the (Intercept)
and wt
).
library(tidyverse);
shared.coef <- intersect(names(coef(lm1)), names(coef(lm2)));
shared.coef;
#[1] "(Intercept)" "wt"
df1 <- lm1 %>%
as.data.frame() %>%
select(one_of(shared.coef)) %>%
mutate(model = "lm1");
df2 <- lm2 %>%
as.data.frame() %>%
select(one_of(shared.coef)) %>%
mutate(model = "lm2");
The posterior densities for 4000 MCMC draws are stored in two data.frame
s.
We plot the posterior densities.
# Plot posterior densities for all common parameters
bind_rows(df1, df2) %>%
gather(var, value, 1:length(shared.coef)) %>%
ggplot(aes(value, colour = model)) +
geom_density() +
facet_wrap(~ var, scale = "free");
We compare the posterior density distributions of every shared parameter in a t test and a KS test. Here I'm using library broom
to tidy-up the output.
# Perform t test and KS test
library(broom);
res <- lapply(1:length(shared.coef), function(i)
list(t.test(df1[, i], df2[, i]), ks.test(df1[, i], df2[, i])));
names(res) <- shared.coef;
lapply(res, function(x) bind_rows(sapply(x, tidy)));
#$`(Intercept)`
# estimate estimate1 estimate2 statistic p.value parameter conf.low conf.high
#1 -4.497093 30.07725 34.57434 -104.8882 0 7155.965 -4.581141 -4.413045
#2 NA NA NA 0.7725 0 NA NA NA
# method alternative
#1 Welch Two Sample t-test two.sided
#2 Two-sample Kolmogorov-Smirnov test two-sided
#
#$wt
# estimate estimate1 estimate2 statistic p.value parameter conf.low
#1 0.1825202 -3.097777 -3.280297 9.120137 1.074479e-19 4876.248 0.1432859
#2 NA NA NA 0.290750 0.000000e+00 NA NA
# conf.high method alternative
#1 0.2217544 Welch Two Sample t-test two.sided
#2 NA Two-sample Kolmogorov-Smirnov test two-sided
#
#There were 12 warnings (use warnings() to see them)
(The warnings originate from unequal factor levels when binding rows, and can be ignored.)