R: Modeling random intercepts from lme4 or brms objects

Is there some way of directly (jointly) modeling the random intercepts estimated with lme4's lmer() or brms? For example, in the below code I fit a hierarchical model, extract the random intercepts, then model them.

One downside to this two-step approach is that I am ignoring the error in the intercepts. This is easily fixed with a robust covariance matrix, weighted least squares, etc. However, jointly estimating all of this in a single model is preferable.

For context: I am interested in this because I am estimating an item response model where each random intercept is a person's ability at each point in time and I want to predict those abilities. I'll be doing all of this within a much more complex, Bayesian model.

library(lme4)
library(tibble)
set.seed(123)

# Simulate longitudinal data
N <- 100
time <- 2

# Time-varying data
df <- tibble(person = rep(1:N, time),
           x = rnorm(N*time),       
           y = 2 + x*runif(N*time)) 

# Fit hierarchical model
mod <- lmer(y ~ -1 + (1 | person), df)

# Time-invariant data (constant within person)
df_person <- data.frame(ints = data.frame(ranef(mod))$condval,
                        sex = rbinom(N, size = 1, prob = 0.5))

# Model intercepts as function of time-invariant feature
summary(lm(ints ~ 1 + sex, df_person))

Solution

I don't know about lme4 or brms, but it can be done directly in Stan. Here's a way to replicate your model, but with everything jointly estimated; see this example (in Python, not R) and this paper for more.

The model

We model the observed values of y with by-person random intercepts (one $alpha$ for each person j) and a scale parameter $sigma$ .

$model of outcomes$

We model the by-person random intercepts as a function of the person's sex and the coefficient $beta$ , plus a second scale parameter $sigma alpha$ .

$model of intercepts$

The Stan code

Here's the Stan code for the model described above. It uses a non-centered parameterization: "raw" values of $alpha$ have a standard normal prior, and we convert to the "real" values of $alpha$ using $beta$ and $sigma alpha$ .

data {
  int<lower=0> N_obs; // number of observations
  int<lower=2> N_person; // number of persons
  int<lower=1,upper=N_person> person[N_obs]; // person associated with each observation
  vector<lower=0,upper=1>[N_person] sex; // sex of each person
  vector[N_obs] y; // observed outcomes
}

parameters {
  real<lower=0> sigma; // observation-level variation
  vector[N_person] alpha_person_raw; // random intercepts for persons
  real mu_alpha; // mean of random intercepts for persons
  real<lower=0> sigma_alpha; // scale of random intercepts for persons
  real beta; // coefficients for sex
}

transformed parameters {
  vector[N_person] alpha_person = (alpha_person_raw * sigma_alpha) +
                                  mu_alpha +
                                  (sex * beta);
}

model {
  sigma ~ exponential(1);
  alpha_person_raw ~ std_normal();
  mu_alpha ~ std_normal();
  sigma_alpha ~ exponential(1);
  beta ~ std_normal();
  y ~ normal(alpha_person[person], sigma);
}

Fitting the Stan model

I recreated the dataset using the same rules as in the original example, but in a slightly different format that's easier for Stan to work with.

library(tidyverse)
person.df = data.frame(person = 1:N, sex = rbinom(N, size = 1, prob = 0.5))
obs.df = data.frame(person = rep(1:N, time),
                    x = rnorm(N * time)) %>%
  mutate(y = 2 + (x * runif(N * time)))

library(rstan)
stan.data = list(
  N_obs = nrow(obs.df),
  N_person = nrow(person.df),
  person = obs.df$person,
  sex = person.df$sex,
  y = obs.df$y
)
stan.model = stan("two_level_model.stan", data = stan.data, chains = 3)

Comparing the two methods

First, I re-fit the two-stage model using the new datasets.

first.level.m = lmer(y ~ -1 + (1 | person), obs.df)
second.level.m = lm(intercept ~ 1 + sex,
                    person.df %>%
                      mutate(intercept = ranef(first.level.m)$person[["(Intercept)"]]))

Now, let's compare the estimated parameter values. The two approaches agree that the effect of sex includes 0 and that the average of the random intercepts is approximately 2. (Naturally, these estimates are influenced by the fact that x isn't yet included as a predictor in the models.) They estimate approximately the same error term at the observation level, but the joint model estimates a substantially smaller error term for the model of the random intercepts. I'm not sure how the model-fitting procedure of lme4 relates to the exponential prior I put on $sigma alpha$ in the Stan model.

library(tidybayes)
bind_rows(
  spread_draws(stan.model, mu_alpha, beta, sigma_alpha, sigma) %>%
    ungroup() %>%
    dplyr::select(.draw, mu_alpha, beta, sigma_alpha, sigma) %>%
    pivot_longer(cols = -.draw, names_to = "parameter") %>%
    group_by(parameter) %>%
    summarise(lower.95 = quantile(value, 0.025),
              lower.50 = quantile(value, 0.25),
              est = median(value),
              upper.50 = quantile(value, 0.75),
              upper.95 = quantile(value, 0.975)) %>%
    ungroup() %>%
    mutate(parameter = case_when(parameter == "mu_alpha" ~ "mu[alpha]",
                                 parameter == "beta" ~ "beta",
                                 parameter == "sigma_alpha" ~ "sigma[alpha]",
                                 parameter == "sigma" ~ "sigma"),
           model = "joint"),
  summary(second.level.m)$coefficients %>%
    data.frame() %>%
    rownames_to_column("parameter") %>%
    mutate(lower.95 = Estimate + (Std..Error * qt(0.025, second.level.m$df.residual)),
           lower.50 = Estimate + (Std..Error * qt(0.25, second.level.m$df.residual)),
           est = Estimate,
           upper.50 = Estimate + (Std..Error * qt(0.75, second.level.m$df.residual)),
           upper.95 = Estimate + (Std..Error * qt(0.975, second.level.m$df.residual)),
           parameter = case_when(parameter == "(Intercept)" ~ "mu[alpha]",
                                 parameter == "sex" ~ "beta"),
           model = "two-stage") %>%
    dplyr::select(model, parameter, est, matches("lower|upper")),
  data.frame(est = summary(second.level.m)$sigma,
             parameter = "sigma[alpha]",
             model = "two-stage"),
  data.frame(est = summary(first.level.m)$sigma,
             parameter = "sigma",
             model = "two-stage")
) %>%
  mutate(parameter = fct_relevel(parameter, "beta", "mu[alpha]", "sigma[alpha]",
                                 "sigma")) %>%
  ggplot(aes(x = parameter, color = model)) +
  geom_linerange(aes(ymin = lower.95, ymax = upper.95), size = 1,
                 position = position_dodge(width = 0.3)) +
  geom_linerange(aes(ymin = lower.50, ymax = upper.50), size = 2,
                 position = position_dodge(width = 0.3)) +
  geom_point(aes(y = est), size = 3, position = position_dodge(width = 0.3)) +
  scale_x_discrete(labels = scales::parse_format()) +
  labs(x = "Parameter", y = "Estimated parameter value", color = "Model") +
  coord_flip() +
  theme_bw()