rmergelongitudinal

Merging data to create a longitudinal dataframe in R


I have dataframe, df_master, which is comprised of longitudinal data for multiple patients ("record_id") where each row represents a day of follow-up ("time").

A second data frame has relapse time points for patients were they to relapse. In this toy example, patient A relapses once at time = 2, patient B relapses at time = 1 and time = 3, and patient C never relapses.

Toy data:

library(tidyverse)

record_id <- c("A", "A", "A", "B", "B", "B", "B", "C", "C", "C", "C", "C")
time <- c(1, 2, 3, 1, 2, 3, 4, 1, 2, 3, 4, 5)

df_master <- tibble(record_id, time)

record_id <- c("A", "B", "B")
day_relapse <- c(2, 1, 3)

df_relapses <- tibble(record_id, day_relapse)

Our two dataframes look like this:

> print(df_master)

# A tibble: 12 × 2
   record_id  time
   <chr>     <dbl>
 1 A             1
 2 A             2
 3 A             3
 4 B             1
 5 B             2
 6 B             3
 7 B             4
 8 C             1
 9 C             2
10 C             3
11 C             4
12 C             5

> print(df_relapses)
# A tibble: 3 × 2
  record_id day_relapse
  <chr>           <dbl>
1 A                   2
2 B                   1
3 B                   3

I'd like to create a merged dataframe that includes all of the data from df_master where the variable day_relapse is added to the new dataframe, df_merged, where the value for day_relapse is NA except for when time == day_relapse for a given record_id in which case day_relapse will equal the time value. The merged dataframe should look like this:

# A tibble: 12 × 3
   record_id  time day_relapse
   <chr>     <dbl>       <dbl>
 1 A             1          NA
 2 A             2           2
 3 A             3          NA
 4 B             1           1
 5 B             2          NA
 6 B             3           3
 7 B             4          NA
 8 C             1          NA
 9 C             2          NA
10 C             3          NA
11 C             4          NA
12 C             5          NA

How do I generate the merged dataframe? The strange formatting of the merged df is required for a type of swim plot that I am creating. Thanks in advance.


Solution

  • We could duplicate the day_relapse column with the same name (time) as the other table we want to join with. Then we'll get day_relapse when it's available in the 2nd table and NA when it's not.

    df_master |>
      left_join(df_relapses |> mutate(time = day_relapse))
    

    Result

    Joining with `by = join_by(record_id, time)`
    # A tibble: 12 × 3
       record_id  time day_relapse
       <chr>     <dbl>       <dbl>
     1 A             1          NA
     2 A             2           2
     3 A             3          NA
     4 B             1           1
     5 B             2          NA
     6 B             3           3
     7 B             4          NA
     8 C             1          NA
     9 C             2          NA
    10 C             3          NA
    11 C             4          NA
    12 C             5          NA