For a jittering exercise, I want to create a large number of new columns automatically based on a row-wise operation. I have a data frame with unit ids, an indicator measurement per year, and first compute the year-to-year standard deviation:
library(tidyverse)
df <- data.frame(id = c("A", "A", "A", "B", "B", "B"),
year = c(2008, 2009, 2010),
indicator = c(12,13,8, 23,21,17))
df <- df %>%
group_by(id) %>%
mutate(indicator_sd = sd(indicator)) %>%
ungroup()
Now, I want to create additional columns which should compute dummy indices for statistical testing. I use rnorm
for that:
test <- df %>%
group_by(id) %>%
mutate(test1 = rnorm(n(), mean = indicator, sd = indicator_sd),
test2 = rnorm(n(), mean = indicator, sd = indicator_sd),
test3 = rnorm(n(), mean = indicator, sd = indicator_sd),
test4 = rnorm(n(), mean = indicator, sd = indicator_sd)) %>%
ungroup()
This all works fine, except I want to repeat this test several hundred times. I have played around with across, but not found a workable solution, even if this seems trivial to do.
Can anyone give me good advice how to automate the mutate
? Thank you!
Well you could use replicate function from base R
# Sample data
df <- data.frame(id = c("A", "A", "A", "B", "B", "B"),
year = c(2008, 2009, 2010, 2008, 2009, 2010),
indicator = c(12, 13, 8, 23, 21, 17))
df <- df %>%
group_by(id) %>%
mutate(indicator_sd = sd(indicator)) %>%
ungroup()
# First select the number of iterations (if want to repeat 100 times, replace 4 with 100)
n <- 4
# Generate n test columns using replicate
testCols <- as.data.frame(replicate(n,
rnorm(nrow(df),
mean = df$indicator,
sd = df$indicator_sd)))
# Rename the test columns to "test1", "test2", ...
names(testCols) <- paste0("test", 1:n)
# Bind (dplyr package) the test columns to the original df
result <- bind_cols(df, testCols)
And the output is
# A tibble: 6 x 8
id year indicator indicator_sd test1 test2 test3 test4
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 A 2008 12 2.65 11.7 9.99 11.7 12.8
2 A 2009 13 2.65 15.0 13.7 14.9 16.5
3 A 2010 8 2.65 6.12 11.2 9.94 6.43
4 B 2008 23 3.06 26.2 25.2 25.9 23.6
5 B 2009 21 3.06 16.9 22.5 21.7 23.1
6 B 2010 17 3.06 21.6 16.7 19.9 19.9