rdataframedatedplyrpheatmap

Calculate dates based on completion date and make a heatmap


I have a dataset that looks like below

df <- data.frame(
  id = 1:6, # New id column
  drinks_on_previous_day1 = c(5,4,2,1,5,4),
  drinks_on_previous_day2 = c(5,4,2,1,5,5),
drinks_on_previous_day3 = c(5,4,2,1,5,6),
drinks_on_previous_day4 = c(5,4,2,1,5,7),
drinks_on_previous_day5 = c(5,4,2,1,5,8),
  completion_date = as.Date(c("2023-04-18 16:32:17", "2023-04-21 16:32:17", "2023-04-24 16:32:17", "2023-04-25 16:32:17", "2023-05-01 16:32:17", "2023-05-02 16:32:17"))
)

I have the drinks questions indicating the previous days, but I don't have exact dates. These would be different for each participant.

I have their completion date.

For each participant, First I'd like to calculate the dates based on their completion date. I am completely lost as to how to do that for each participant. Finally, I would like the data to be in a long format.

Here is what the dataframe should look like:

df_long <- df %>% 
  pivot_longer(
    cols = starts_with("drinks_on_previous_day"), # Select columns to pivot
    names_to = "day", # Name of the new column for day values
    values_to = "drinks" # Name of the new column for drinks values
  )

But instead of column names in the day column, I'd like to be the exact 5 dates for each participant based on their completion date.

Finally, I would like to plot a heatmap with the drinks on each day with one line for each participant with different colors for different number of drinks.

 >5 drinks -> red
 <5 drinks -> green

Solution

  • I am not quite sure about the plot, but we can add the dates to the data:

    library(tidyverse)
    
    df %>% 
      pivot_longer(
        cols = starts_with("drinks_on_previous_day"), # Select columns to pivot
        names_to = "day", # Name of the new column for day values
        values_to = "drinks" # Name of the new column for drinks values
      ) %>% 
      mutate(day = as.numeric(gsub("drinks_on_previous_day", "", day))) %>% 
      na.omit() %>% 
      mutate(drink_date = completion_date + day - max(day),
             drink_threshold = if_else(drinks >= 5, ">= 5", "< 5"),
             .by = id) -> df_long
    
    #> # A tibble: 30 × 6
    #>       id completion_date   day drinks drink_date drink_threshold
    #>    <int> <date>          <dbl>  <dbl> <date>     <chr>          
    #>  1     1 2023-04-18          1      5 2023-04-14 >= 5           
    #>  2     1 2023-04-18          2      5 2023-04-15 >= 5           
    #>  3     1 2023-04-18          3      5 2023-04-16 >= 5           
    #>  4     1 2023-04-18          4      5 2023-04-17 >= 5           
    #>  5     1 2023-04-18          5      5 2023-04-18 >= 5           
    #>  6     2 2023-04-21          1      4 2023-04-17 < 5            
    #>  7     2 2023-04-21          2      4 2023-04-18 < 5            
    #>  8     2 2023-04-21          3      4 2023-04-19 < 5            
    #>  9     2 2023-04-21          4      4 2023-04-20 < 5            
    #> 10     2 2023-04-21          5      4 2023-04-21 < 5            
    #> # ℹ 20 more rows
    

    I created a heatmap-like plot using :

    ggplot(df_long) +
      geom_bar(aes(drink_date, id, fill = drink_threshold), 
               stat = "identity", color = "black", width=1) +
      facet_wrap(~id, ncol = 1, strip.position="left", scales = "free_y") +
      scale_x_date(date_breaks = "1 day", expand =c(0.01,0.01)) +
      scale_fill_manual(values=c("springgreen2", "firebrick2")) +
      theme_gray() +
      theme(axis.text.y = element_blank(),
            axis.title.y = element_text(angle = 0, vjust = 0.5),
            line = element_blank(),
            axis.text.x = element_text(angle = 90, vjust = 0.5),
            panel.spacing = unit(0, "lines"),
            strip.text.y.left = element_text(angle = 0),
            strip.background = element_rect(colour="white", fill="white"))