rmutateacross

Create a large number of new columns


For a jittering exercise, I want to create a large number of new columns automatically based on a row-wise operation. I have a data frame with unit ids, an indicator measurement per year, and first compute the year-to-year standard deviation:

library(tidyverse)
df <- data.frame(id = c("A", "A", "A", "B", "B", "B"),
                 year = c(2008, 2009, 2010),
                 indicator = c(12,13,8, 23,21,17))


df <- df %>%
  group_by(id) %>%
  mutate(indicator_sd = sd(indicator)) %>%
  ungroup()

Now, I want to create additional columns which should compute dummy indices for statistical testing. I use rnorm for that:

test <- df %>%
  group_by(id) %>%
  mutate(test1 = rnorm(n(), mean = indicator, sd = indicator_sd),
         test2 = rnorm(n(), mean = indicator, sd = indicator_sd),
         test3 = rnorm(n(), mean = indicator, sd = indicator_sd),
         test4 = rnorm(n(), mean = indicator, sd = indicator_sd)) %>%
  ungroup()

This all works fine, except I want to repeat this test several hundred times. I have played around with across, but not found a workable solution, even if this seems trivial to do.

Can anyone give me good advice how to automate the mutate? Thank you!


Solution

  • Well you could use replicate function from base R

    # Sample data
    df <- data.frame(id = c("A", "A", "A", "B", "B", "B"),
                     year = c(2008, 2009, 2010, 2008, 2009, 2010),
                     indicator = c(12, 13, 8, 23, 21, 17))
    
    df <- df %>%
      group_by(id) %>%
      mutate(indicator_sd = sd(indicator)) %>%
      ungroup()
    
    # First select the number of iterations (if want to repeat 100 times, replace 4 with 100)
    n <- 4
    
    # Generate n test columns using replicate
    testCols <- as.data.frame(replicate(n, 
                                         rnorm(nrow(df),
                                               mean = df$indicator,
                                               sd = df$indicator_sd)))
    
    # Rename the test columns to "test1", "test2", ...
    names(testCols) <- paste0("test", 1:n)
    
    # Bind (dplyr package) the test columns to the original df
    result <- bind_cols(df, testCols)
    

    And the output is

    # A tibble: 6 x 8
      id     year indicator indicator_sd test1 test2 test3 test4
      <chr> <dbl>     <dbl>        <dbl> <dbl> <dbl> <dbl> <dbl>
    1 A      2008        12         2.65 11.7   9.99 11.7  12.8 
    2 A      2009        13         2.65 15.0  13.7  14.9  16.5 
    3 A      2010         8         2.65  6.12 11.2   9.94  6.43
    4 B      2008        23         3.06 26.2  25.2  25.9  23.6 
    5 B      2009        21         3.06 16.9  22.5  21.7  23.1 
    6 B      2010        17         3.06 21.6  16.7  19.9  19.9