rparallel-processingpower-analysis

How to parallelize simr power curves in R


The powercurve function from the simr package in R (Green & MacLeod, 2016) can incur very long running times when the method used for the calculation of p values is Kenward-Roger or Satterthwaite (see Luke, 2017). Is it possible to speed up the function using parallel computation?


Solution

  • As I have described in this blog post, the powerCurve function can be run in parallel by separating the levels of the factor that is passed to the breaks argument—i.e., various numbers of participants, of stimuli, etc.

    Let’s do a minimal example using a toy lmer model. A power curve will be created for the fixed effect of x along different sample sizes of the grouping factor g.

    Notice that the six sections of the power curve below are serially arranged, one after another. In contrast, to enable parallel processing, each power curve would be placed in a single script, and they would all be run at the same time.

    Although the power curves below run in a few minutes, the settings that are often used (e.g., a larger model; fixed('x', 'sa') instead of fixed('x'); nsim = 500 instead of nsim = 50) take far longer. That is where parallel processing becomes useful.

    Having saved each section of the power curve, we must now combine them to be able to plot them together (if you wish to automate this procedure, consider this function).

    
    library(lme4)
    library(simr)
    
    # Toy model with data from 'simr' package
    fm = lmer(y ~ x + (x | g), data = simdata)
    
    # Extend sample size of `g`
    fm_extended_g = extend(fm, along = 'g', n = 12)
    
    # Parallelize `breaks` by running each number of levels in a separate function.
    
    # 4 levels of g
    pwcurve_4g = powerCurve(fm_extended_g, fixed('x'), along = 'g', breaks = 4, 
                            nsim = 50, seed = 123, 
                            # No progress bar
                            progress = FALSE)
    
    # 6 levels of g
    pwcurve_6g = powerCurve(fm_extended_g, fixed('x'), along = 'g', breaks = 6, 
                            nsim = 50, seed = 123, 
                            # No progress bar
                            progress = FALSE)
    
    # 8 levels of g
    pwcurve_8g = powerCurve(fm_extended_g, fixed('x'), along = 'g', breaks = 8, 
                            nsim = 50, seed = 123, 
                            # No progress bar
                            progress = FALSE)
    
    # 10 levels of g
    pwcurve_10g = powerCurve(fm_extended_g, fixed('x'), along = 'g', breaks = 10, 
                             nsim = 50, seed = 123, 
                             # No progress bar
                             progress = FALSE)
    
    # 12 levels of g
    pwcurve_12g = powerCurve(fm_extended_g, fixed('x'), along = 'g', breaks = 12, 
                             nsim = 50, seed = 123, 
                             # No progress bar
                             progress = FALSE)
    
    # Create a destination object using any of the power curves above.
    all_pwcurve = pwcurve_4g
    
    # Combine results
    all_pwcurve$ps = c(pwcurve_4g$ps[1], pwcurve_6g$ps[1], pwcurve_8g$ps[1], 
                       pwcurve_10g$ps[1], pwcurve_12g$ps[1])
    
    # Combine the different numbers of levels.
    all_pwcurve$xval = c(pwcurve_4g$nlevels, pwcurve_6g$nlevels, pwcurve_8g$nlevels, 
                         pwcurve_10g$nlevels, pwcurve_12g$nlevels)
    
    print(all_pwcurve)
    #> Power for predictor 'x', (95% confidence interval),
    #> by number of levels in g:
    #>       4: 46.00% (31.81, 60.68) - 40 rows
    #>       6: 74.00% (59.66, 85.37) - 60 rows
    #>       8: 92.00% (80.77, 97.78) - 80 rows
    #>      10: 98.00% (89.35, 99.95) - 100 rows
    #>      12: 100.0% (92.89, 100.0) - 120 rows
    #> 
    #> Time elapsed: 0 h 0 m 14 s
    
    plot(all_pwcurve, xlab = 'Levels of g')
    

    Created on 2023-09-04 with reprex v2.0.2