rmixed-modelstrialsample-size

Sample size calculation for a multicenter RCT using a mixed effects model using simulated data in R


I am currently setting up a multicenter RCT of a treatment versus placebo (1:1), with a primary outcome of prolongation of pregnancy in days. The minimal clinically relevant difference in days is 5 days (between treatment and placebo). The distribution of prolongation in days is skewed and presumed to be log-normal based on previous studies. The center-specific attitude towards prolongation will have an impact on the primary outcome, and this attitude might change in time during the trial. The size of this impact is really hard for us to estimate.

Based on feedback, I decide to calculate the sample size needed for our trial (power 80%, alpha 0.05) using simulated data (created based on summary statistics from a comparable single-center trial; we don't have much data from our population unfortunately) in a log-normal mixed effects model, using center as random effect and treatment as fixed effect.

This is the information I have from the previous trial: Mean of log-transformed prolongation data (placebo): 2.25 SD of log-transformed prolongation data (placebo): 1.23 Mean of untransformed prolongation data (placebo): 17.6 days SD of untransformed prolongation data (placebo): 18 days Total number of patients in trial: 180 (90 per arm)

My code based an minimal clinically relevant difference (MCRD) of 5 days:


# Define parameters
## I used the same sample size as the other trial, 180 patients (90 per arm)
n_centers <- 9  # Number of centers
n_per_center_per_group <- 10  # Number of participants per treatment group per center
total_n <- n_centers * n_per_center_per_group * 2  # Total number of participants (with 2 treatment arms)
n <- 90

# Generate treatment variable for placebo
treatment_placebo <- rep("Placebo", n_per_center_per_group * n_centers)

# Generate treatment variable for medicine
treatment_medicine <- rep("Medicine", n_per_center_per_group * n_centers)

treatment <- factor(c(treatment_placebo, treatment_medicine))

# Generate center variable for each treatment group
center_placebo <- rep(1:n_centers, each = n_per_center_per_group)
center_medicine <- rep(1:n_centers, each = n_per_center_per_group)

center <- factor(c(center_placebo, center_medicine))

# Combine placebo and medicine data
simulated_data <- data.frame(treatment = factor(c(treatment_placebo, treatment_medicine)),
                             center = factor(c(center_placebo, center_medicine)))


# Generate log-normal prolongation of gestation with different means for treatment and placebo
mean_placebo <- 2.25 # mean of log normal transformed data in previous trial
MCRD <- log (16/11) #MCRD of 5 days on the untransformed scale: 11 days estimated as a mean prolongation in placebo and 16 days in treatment group, transformed to put it on log-scale. 
mean_treatment <- mean_placebo + MCRD 
sd_log <- 1.23  # Standard deviation on the log scale (edited, incorrectly defined as 0.7 in previous version)

log_prolongation_placebo <- rlnorm(n, meanlog = mean_placebo, sdlog = sd_log)
log_prolongation_treatment <- rlnorm(n, meanlog = mean_treatment, sdlog = sd_log)

log_prolongation_placebo <- log(log_prolongation_placebo)
log_prolongation_treatment <- log(log_prolongation_treatment)

log_prolongation <- c(log_prolongation_placebo, log_prolongation_treatment)

# Create data frame
simulated_data <- data.frame(treatment = factor(c(treatment_placebo, treatment_metformin)),
                             log_prolongation = log_prolongation,
                             center = factor(c(center_placebo, center_metformin)))

#RUN SIMULATION WITH SIMULATED DATA
library(lme4)
lmer_model <- lmer(log_prolongation ~ treatment + (1 | center), data = simulated_data)
summary(lmer_model)

powerSim(lmer_model, nsim=1000, alpha = 0.05)

I have little experience with R/statistics and I am afraid that I might not have provided the right input for the model (especially as I am log-transforming data back and forth).

Edited for clarification: Questions:

  1. Is the calculation of MCRD in this code right?
  2. Is the coding for the scales (log-normal) right? Am I not accidentally mixing log-normal and untransformed scales?
  3. Do you have any feedback on the approach taken to calculate the sample size?

Solution

    1. On the log normal scale, differences are fold changes and so you need to relate your MCRD to the placebo mean that you're using. If you back-transform your current values for mean_placebo and mean_treatment you'll see that they're 4.3 days apart not the 5 days that you'd like. You could use:

      mean_placebo <- 2.25 # mean of log normal transformed data in previous trial
      mean_treatment <- log(exp(mean_placebo) + 5) #MCRD of 5 days on the untransformed scale
      MCRD <- mean_treatment - mean_placebo
      

      to make the MCRD 5 days when back transformed onto the 'days' scale. Side note, you've said the SD of the log transformed data is 1.23 but you've used 0.7 in your code, worth checking!

    2. Since your means and SD are on the log normal scale, you can just simulate using rnorm and not need to transform back and forth onto the log scale. If you run the code chunk below, you'll see that you get the same results from taking the log of data simulated using rlnorm and simulating data using rnorm.

      set.seed(42)
      hist(log(rlnorm(n, meanlog = mean_placebo, sdlog = sd_log)))
      set.seed(42)
      hist(rnorm(n, mean = mean_placebo, sd = sd_log))
      

      so you can just use

      log_prolongation_placebo <- rnorm(n, mean = mean_placebo, sd = sd_log)
      log_prolongation_treatment <- rnorm(n, mean = mean_treatment, sd = sd_log)
      

      without further transformation.

    3. This is a multicenter trial so you need to include between center variability. I believe your current SD is from a single center trial so makes sense to use as the within center variability, but you need to search literature for a reasonable value to use for between center variability. When you have that use

      log_prolongation <- log_prolongation + rnorm(n_centers, 0, betweenCentreSD_onLogScale)[factor(c(center_placebo, center_medicine))]
      

      before you create your final simulated data set to include this extra level of variability.

      Hope this all makes sense and best of luck with the trial!