rsimulationmixed-models

Simulating correlated random intercept and slope for mixed model data


I've written a data-generating function in R which simulates data used for hierarchical (multi-level) modeling. The function generates both fixed and random effects with correlated intercepts and slopes. I am unsure if I've implemented the correlation coefficient (rho), representing the correlation between random intercepts and slopes, correctly within the function.

My function looks like this:

data_generating_function <- function(n_classes, n_students, gamma00, gamma10, gamma01, 
                                     sd_intercept, sd_slope, rho_intercept_slope) {
  data <- data.frame(
    class_id = rep(1:n_classes, each = n_students),
    x1ij = rnorm(n_classes * n_students, mean = 0, sd = 1), # Level 1 predictor
    z1j = rep(rnorm(n_classes, mean  = 0), each = n_students),  # Level 2 predictor
    yij = 1  # Placeholder, will be updated later
  )
  
  X <- model.matrix(~ x1ij + z1j, data)  # fixed effects model matrix
  myFormula <- "yij ~ x1ij + z1j + (x1ij|class_id)"
  foo <- lFormula(eval(myFormula), data)
  Z <- t(as.matrix(foo$reTrms$Zt))  # random effects design matrix
  
  betas <- c(gamma00, gamma10, gamma01)  # Fixed effects vector 
  

# creating the random effects
  random_effects_per_class <- mvrnorm( 
    n = n_classes, 
    mu = c(0, 0), # Zero mean for the random effects
    Sigma = matrix(c(sd_intercept^2, rho_intercept_slope * sd_intercept * sd_slope, 
                     rho_intercept_slope * sd_intercept * sd_slope, sd_slope^2), 
                   nrow = 2, byrow = TRUE)
  )
  


  u <- as.vector(random_effects_per_class)
# random effects as vector
  
  e <- rnorm(n_classes * n_students, mean = 0, sd = 1)
# error
  
  data$yij <- X %*% betas + Z %*% u + e
# creating data

  
  return(data)
}

In my understanding a increase in rho_intercept_slope should result in an increase in the correlation of the intercept and slope.

data <- data_generating_function(n_classes = 10, n_students = 100, gamma00 = 100, gamma10 = 3, gamma01 = 1, 
                                 sd_intercept = 5, sd_slope = 2, rho_intercept_slope = 0.5)


model <- lmer(yij ~ x1ij + z1j + (x1ij|class_id), data = data)
summary(model)

However no matter the value of rho_intercept_slope the value of Corr always is around 0. I have also checkt this via simulation, where I ran the simulation multiple times.

Random effects:
 Groups   Name        Variance Std.Dev. Corr
 class_id (Intercept) 11.8927  3.4486       
          x1ij        11.3084  3.3628   0.02
 Residual              0.9745  0.9872       
Number of obs: 10000, groups:  class_id, 100

Any ideas on how to fix this?


Solution

  • As far as I can tell the problem is that you unpacked your random-effects vector in the wrong order. I changed the corresponding line of code to

    u <- c(t(random_effects_per_class))
    

    and got more sensible answers. For what it's worth, before I went bug-hunting I made my own data-generating function that uses the built-in simulate method ... the only tricky part is converting the standard deviation/correlation parameters to the parameterization that lme4 expects internally (this uses lme4::Sv_to_Cv()) ...

    dgf2 <- function(n_classes, n_students, gamma00, gamma10, gamma01, 
                        sd_intercept, sd_slope, rho_intercept_slope) {
        
        dd <- data.frame(
            class_id = rep(1:n_classes, each = n_students),
            x1ij = rnorm(n_classes * n_students, mean = 0, sd = 1), # Level 1 predictor
            z1j = rep(rnorm(n_classes, mean  = 0), each = n_students)  # Level 2 pr
        )
        ## convert sd/cor parameters to Cholesky factor
        theta <- unname(Sv_to_Cv(c(sd_intercept, rho_intercept_slope, sd_slope)))
        dd$yij <- simulate(~ x1ij + z1j + (x1ij|class_id),
                             newdata = dd,
                             newparams =  list(beta = c(gamma00, gamma10, gamma01),
                                               theta = theta,
                                               sigma = 1))[[1]]
        return(dd)
    }
    

    Testing framework:

    simfun <- function(dgf = data_generating_function) {
        data <- dgf(n_classes = 10, n_students = 100,
                    gamma00 = 100, gamma10 = 3, gamma01 = 1, 
                    sd_intercept = 5, sd_slope = 2,
                    rho_intercept_slope = 0.5)
        model <- lmer(yij ~ x1ij + z1j + (x1ij|class_id), data = data)
        cov2cor(VarCorr(model)[[1]])[1,2]
    }
    set.seed(101)
    rho_vec <- replicate(500, simfun())
    hist(rho_vec)
    rho_vec2 <- replicate(500, simfun(dgf2))
    hist(rho_vec2)