rdateorganizationlongitudinalreorganize

Reorganize multiple variables based on date variable in R


If I have a dataset that has scores from the same measure collected at different time points, how can I organize those dates/times so they represents a timepoint after a certain date? Is this possible to do in R or would it be easier for me to do this in another program?

I have a dataset that currently looks like this:

id  date        score1_date score1  score2_date score2  score3_date score3
101 1/6/2020    1/1/2020    20      1/8/2020    18      1/15/2020   16
102 2/27/2020   2/14/2020   16      2/21/2020   16      2/28/2020   10
103 1/10/2020   1/7/2020    30      1/14/2020   25      1/21/2020   20
104 3/5/2020    3/6/2020    40      3/13/2020   42      3/20/2020   40

I want to find the closest [score#_date] to [date] and identify that as [time1] and then have everything that follows as [time2], [time3], etc.

Here is the code for that above table:

structure(list(id = c(101, 102, 103, 104), date = structure(c(18267, 
18319, 18271, 18326), class = "Date"), score1_date = structure(c(18262, 
18306, 18268, 18327), class = "Date"), score1 = c(20, 16, 30, 
40), score2_date = structure(c(18269, 18313, 18275, 18334), class = "Date"), 
    score2 = c(18, 16, 25, 42), score3_date = structure(c(18276, 
    18320, 18282, 18341), class = "Date"), score3 = c(16, 10, 
    20, 40)), row.names = c(NA, -4L), class = c("tbl_df", "tbl", 
"data.frame"))

So I would eventually want the dataset to have variables that look something like this:

id  date        time1_date  time1_score time2_date  time2_score time3_date  time3_score
101 1/6/2020    1/8/2020    18          1/15/2020   16          NA          NA
102 2/27/2020   2/28/2020   10          NA          NA          NA          NA
103 1/10/2020   1/7/2020    30          1/14/2020   25          1/21/2020   20
104 3/5/2020    3/6/2020    40          3/13/2020   42          3/20/2020   40

Thank you so much!


Solution

  • Using tidyverse functions you can do :

    library(dplyr)
    library(tidyr)
    
    df %>%
      #Rename date column to base_date
      rename(base_date = date) %>%
      #Rename score1, score2 etc to score1_value, score2_value etc
      rename_with(~paste0(., '_value'), matches('^score\\d+$')) %>%
      #get the data in long format with date and value as two columns
      pivot_longer(cols = starts_with('score'), 
                   names_to = c('score', '.value'), 
                   names_sep = '_') %>%
      group_by(id) %>%
      #Keep only those date where the date is greater than closest date
      filter(date >= date[which.min(abs(date  - base_date))]) %>%
      #Arrange the data
      arrange(id, date) %>%
      #Create new column name
      mutate(score = paste0('time', row_number())) %>%
      ungroup %>%
      #Get the data in wide format
      pivot_wider(names_from = score, values_from = c(date, value)) %>%
      #Arrange the columns
      select(id, base_date, order(suppressWarnings(readr::parse_number(names(.)))))
    
    #    id base_date  date_time1 value_time1 date_time2 value_time2 date_time3 value_time3
    #  <dbl> <date>     <date>           <dbl> <date>           <dbl> <date>           <dbl>
    #1   101 2020-01-06 2020-01-08          18 2020-01-15          16 NA                  NA
    #2   102 2020-02-27 2020-02-28          10 NA                  NA NA                  NA
    #3   103 2020-01-10 2020-01-07          30 2020-01-14          25 2020-01-21          20
    #4   104 2020-03-05 2020-03-06          40 2020-03-13          42 2020-03-20          40