I've written a data-generating function in R which simulates data used for hierarchical (multi-level) modeling. The function generates both fixed and random effects with correlated intercepts and slopes. I am unsure if I've implemented the correlation coefficient (rho), representing the correlation between random intercepts and slopes, correctly within the function.
My function looks like this:
data_generating_function <- function(n_classes, n_students, gamma00, gamma10, gamma01,
sd_intercept, sd_slope, rho_intercept_slope) {
data <- data.frame(
class_id = rep(1:n_classes, each = n_students),
x1ij = rnorm(n_classes * n_students, mean = 0, sd = 1), # Level 1 predictor
z1j = rep(rnorm(n_classes, mean = 0), each = n_students), # Level 2 predictor
yij = 1 # Placeholder, will be updated later
)
X <- model.matrix(~ x1ij + z1j, data) # fixed effects model matrix
myFormula <- "yij ~ x1ij + z1j + (x1ij|class_id)"
foo <- lFormula(eval(myFormula), data)
Z <- t(as.matrix(foo$reTrms$Zt)) # random effects design matrix
betas <- c(gamma00, gamma10, gamma01) # Fixed effects vector
# creating the random effects
random_effects_per_class <- mvrnorm(
n = n_classes,
mu = c(0, 0), # Zero mean for the random effects
Sigma = matrix(c(sd_intercept^2, rho_intercept_slope * sd_intercept * sd_slope,
rho_intercept_slope * sd_intercept * sd_slope, sd_slope^2),
nrow = 2, byrow = TRUE)
)
u <- as.vector(random_effects_per_class)
# random effects as vector
e <- rnorm(n_classes * n_students, mean = 0, sd = 1)
# error
data$yij <- X %*% betas + Z %*% u + e
# creating data
return(data)
}
In my understanding a increase in rho_intercept_slope
should result in an increase in the correlation of the intercept and slope.
data <- data_generating_function(n_classes = 10, n_students = 100, gamma00 = 100, gamma10 = 3, gamma01 = 1,
sd_intercept = 5, sd_slope = 2, rho_intercept_slope = 0.5)
model <- lmer(yij ~ x1ij + z1j + (x1ij|class_id), data = data)
summary(model)
However no matter the value of rho_intercept_slope
the value of Corr
always is around 0. I have also checkt this via simulation, where I ran the simulation multiple times.
Random effects:
Groups Name Variance Std.Dev. Corr
class_id (Intercept) 11.8927 3.4486
x1ij 11.3084 3.3628 0.02
Residual 0.9745 0.9872
Number of obs: 10000, groups: class_id, 100
Any ideas on how to fix this?
As far as I can tell the problem is that you unpacked your random-effects vector in the wrong order. I changed the corresponding line of code to
u <- c(t(random_effects_per_class))
and got more sensible answers. For what it's worth, before I went bug-hunting I made my own data-generating function that uses the built-in simulate
method ... the only tricky part is converting the standard deviation/correlation parameters to the parameterization that lme4
expects internally (this uses lme4::Sv_to_Cv()
) ...
dgf2 <- function(n_classes, n_students, gamma00, gamma10, gamma01,
sd_intercept, sd_slope, rho_intercept_slope) {
dd <- data.frame(
class_id = rep(1:n_classes, each = n_students),
x1ij = rnorm(n_classes * n_students, mean = 0, sd = 1), # Level 1 predictor
z1j = rep(rnorm(n_classes, mean = 0), each = n_students) # Level 2 pr
)
## convert sd/cor parameters to Cholesky factor
theta <- unname(Sv_to_Cv(c(sd_intercept, rho_intercept_slope, sd_slope)))
dd$yij <- simulate(~ x1ij + z1j + (x1ij|class_id),
newdata = dd,
newparams = list(beta = c(gamma00, gamma10, gamma01),
theta = theta,
sigma = 1))[[1]]
return(dd)
}
Testing framework:
simfun <- function(dgf = data_generating_function) {
data <- dgf(n_classes = 10, n_students = 100,
gamma00 = 100, gamma10 = 3, gamma01 = 1,
sd_intercept = 5, sd_slope = 2,
rho_intercept_slope = 0.5)
model <- lmer(yij ~ x1ij + z1j + (x1ij|class_id), data = data)
cov2cor(VarCorr(model)[[1]])[1,2]
}
set.seed(101)
rho_vec <- replicate(500, simfun())
hist(rho_vec)
rho_vec2 <- replicate(500, simfun(dgf2))
hist(rho_vec2)