rfor-loopdplyrleft-joinassign

writing a for loop that appends a column by joining every iteration - r


I need to join data to a tibble (or dataframe) iteratively such that the tibble grows by one column every time the loop is executed.

Let bin_list be the identifier that the join is executed on:

bin_list<-c(6,7,8,9,10,11,12,13)

and let the following three tibbles be joined one at a time:

hour_1<-(tibble(bin_list=c(3,4,5,6,7,8,9,10,11,12,13), rain=c(0,0,.25,0,0,.25,0,0,0,0,.25)))
hour_2<-(tibble(bin_list=c(3,4,5,6,7,8,9,10,11,12,13), rain=c(0,0,.25,0,0,0,0,0,.25,0,.25)))
hour_3<-(tibble(bin_list=c(3,4,5,6,7,8,9,10,11,12,13), rain=c(0,0,.25,0,0,.25,0,0,.5,0,.25)))

Ultimately, I'm trying to produce:

final<- tibble(bin_list=c(6,7,8,9,10,11,12,13), hour_1=c(0,0,.25,0,0,0,0,.25), hour_2=c(0,0,0,0,0,.25,0,.25), hour_3=c(0,0,.25,0,0,.5,0,.25))

been messing around with 'for', 'left_join', and 'assign' but can't crack it. I know there is a more efficient way to join these hours (nesting left joins maybe), but I'm dealing with moderate sized data.

The bin_list is ~75,000 rows and each hour_i is stored as a .txt file ~1.5 million rows. What I"m trying to accomplish is call hour_1, left_join to bin_list, assign it in env, call hour 2, left_join to hour_1 that's already joined bin_list, assign it in env, call hour 3....

for left_join assign


Solution

  • Could you just do a series of left_join()s with a loop:

    library(dplyr)
    bin_list <- data.frame(bin_list = c(6,7,8,9,10,11,12,13))
    hour_1<-(tibble(bin_list=c(3,4,5,6,7,8,9,10,11,12,13), rain=c(0,0,.25,0,0,.25,0,0,0,0,.25)))
    hour_2<-(tibble(bin_list=c(3,4,5,6,7,8,9,10,11,12,13), rain=c(0,0,.25,0,0,0,0,0,.25,0,.25)))
    hour_3<-(tibble(bin_list=c(3,4,5,6,7,8,9,10,11,12,13), rain=c(0,0,.25,0,0,.25,0,0,.5,0,.25)))
    
    ## gets names of all objects that start with hours_ and then have some number of digits afterward. 
    hrs <- grep("hour_\\d+", ls(), value=TRUE)
    
    ## initialize final with bin_list
    final <- bin_list
    for(h in hrs){
    final <- left_join(final, 
                       setNames(get(h), c("bin_list", h)))
    }
    #> Joining with `by = join_by(bin_list)`
    #> Joining with `by = join_by(bin_list)`
    #> Joining with `by = join_by(bin_list)`
    final
    #>   bin_list hour_1 hour_2 hour_3
    #> 1        6   0.00   0.00   0.00
    #> 2        7   0.00   0.00   0.00
    #> 3        8   0.25   0.00   0.25
    #> 4        9   0.00   0.00   0.00
    #> 5       10   0.00   0.00   0.00
    #> 6       11   0.00   0.25   0.50
    #> 7       12   0.00   0.00   0.00
    #> 8       13   0.25   0.25   0.25
    

    Created on 2023-08-11 with reprex v2.0.2