rdataframeif-statementdplyrassign

How to assign values to a df$column from another df using conditions in R


I have two dataframes :

> df1 <- data.frame(date = as.Date( c( "2021-06-01", "2021-06-02", "2021-06-03", "2021-06-04", 
                                      "2021-06-05", "2021-06-06", "2021-06-07", "2021-06-08", 
                                      "2021-06-09", "2021-06-10", "2021-06-11", "2021-06-12",
                                      "2021-06-13") ), 
                   temperature = c( 17, 30, 28, 29, 16, 21, 20, 11, 28, 29, 25, 26, 19) )

and

> df2 <- data.frame( ID = c( 1 : 4 ), 
                    date.pose = as.Date(c("2021-06-01", "2021-06-03", "2021-06-06", "2021-06-10") ),
                    date.withdrawal = as.Date(c("2021-06-02", "2021-06-05", "2021-06-09", "2021-06-13") ) )

I want to store the mean temperature for each period that is in df2 in a new column (df2$mean.temperature).

For ID = 1 from df2, the mean temperature would be calculated with the temperatures from 2021-06-01 and 2021-06-02, witch is mean(17, 30)

In other words, I want to get this :

    > df2 <- data.frame(ID = c( 1 : 4 ),
                       date.pose = as.Date( c("2021-06-01", "2021-06-03", "2021-06-06", "2021-06-10") ) ,
                       date.withdrawal = as.Date( c("2021-06-03", "2021-06-06", "2021-06-10", "2021-06-13") ),
                       mean.Temperature = c(23.5, 24.3,  20.0, 24.8) )

I'm trying to add the ID from df2 in a new column in df1. Once I do that, I could aggregate like this :

> df3 <- aggregate(df1$temperature, list(df1$ID, df2$date.pose), FUN = mean)

I don't know how to add the corresponding ID in df1. Or maybe there is a better way to do this?


Solution

  • Here's an approach using uncount from tidyr and some joins.

    df2 %>%
      mutate(days = (date.witdrawal - date.pose + 1) %>% as.integer) %>%
      tidyr::uncount(days, .id = "row") %>%
      transmute(ID, date = date.pose + row - 1) %>%
      left_join(df1) %>%
      group_by(ID) %>%
      summarize(mean.Temperature = mean(temperature)) %>% 
      right_join(df2)
    

    Result

    # A tibble: 4 × 4
         ID mean.Temperature date.pose  date.witdrawal
      <int>            <dbl> <date>     <date>        
    1     1             23.5 2021-06-01 2021-06-02    
    2     2             24.3 2021-06-03 2021-06-05    
    3     3             20   2021-06-06 2021-06-09    
    4     4             24.8 2021-06-10 2021-06-13