I have a dataset of 1,000s of date times of events, event A and event B. I am looking to test if there is some dependence between them. To do so I wish to randomly shuffle the times in A and B, calculate the diff time between each observation i.e. A to B, then calculate the mean of all diff times. I wish to repeat this test 100s of times.
Im therefore looking for a loop or function rather than copy paste the code.
# the data frame is structured like this with many more observations
set.seed(10)
A <- sample(seq(as.Date('1999/01/01'), as.Date('2000/01/01'), by="day"), 12)
B <- sample(seq(as.Date('2000/01/01'), as.Date('2010/01/01'), by="day"), 12)
df <- data.frame(A, B)
I have been able to generate the output needed as follows, but need to repeat this many time, i.e. have 100s of mean_shuffled results
shuffled_A = sample(df$A)
shuffled_B = sample(df$B)
df_shuffled <- data.frame(shuffled_A, shuffled_B)
df_shuffled$diff <- difftime(df_shuffled$shuffled_B, df_shuffled$shuffled_A)
mean_shuffled <- mean(df_shuffled$diff)
following @jblood94 comments the below has been added
# the data frame is structured like this with many more observations
set.seed(100)
A <- sample(seq(as.Date('1999/01/01'), as.Date('2000/01/01'), by="day"), 120)
B <- A + 2 # as I am testing that B is dependent on A, so B always takes place after A
df <- data.frame(A, B)
df = transform(df, C = sample(A), D = sample(B), E = sample(A), G = sample(B) ) # to create two shuffled diff times
df$diff <- difftime(df$B, df$A) # observed data
df$diff_shuffle1 <- abs(difftime(df$D, df$C, units = "days")) # A and B are at random times but I have added abs() as the diff time can be positive or negative
df$diff_shuffle2 <- abs(difftime(df$G, df$E, units = "days")) # A and B are at random times 2
mean(df$diff) # observed mean
mean(df$diff_shuffle1) # shuffled time difference between A and B is they happen at random times
mean(df$diff_shuffle2) # shuffled time difference between A and B is they happen at random times
You can wrap what you've done in a for()
loop for a given number of loops/simulations
nsims
and track each simulation sim
as it loops around and add the result each to the output
. Note the static data
name, and the dynamic df
in the loop.
set.seed(100)
A <- sample(seq(as.Date('1999/01/01'), as.Date('2000/01/01'), by="day"), 120)
B <- A + 2 # as I am testing that B is dependent on A, so B always takes place after A
data <- data.frame(A, B)
nsims <- 100
sim <- 1
output <- data.frame()
for(i in 1:nsims){
df = transform(data, C = sample(A), D = sample(B), E = sample(A), G = sample(B) ) # to create two shuffled diff times
df$diff <- difftime(df$B, df$A) # observed data
df$diff_shuffle1 <- abs(difftime(df$D, df$C, units = "days")) # A and B are at random times but I have added abs() as the diff time can be positive or negative
df$diff_shuffle2 <- abs(difftime(df$G, df$E, units = "days")) # A and B are at random times 2
obsM <- mean(df$diff) # observed mean
shuf1M <- mean(df$diff_shuffle1) # shuffled time difference between A and B is they happen at random times
shuf2M <- mean(df$diff_shuffle2) # shuffled time difference between A and B is they happen at random times
out <- data.frame(obsM,shuf1M,shuf2M,sim)
output <- rbind(output,out)
sim <- sim+1
}
output