rtidyr

Pivot distinct value pairs wider using timestamp


I have distinct pairwise data points in column Utterance; the data pairs have identical Timestamps. I'd like to pivot the distinct Utterance pairs wider and align them on the same row with the same Timestamp. I can do it, though I'm wondering if there's a way to do it more elegantly, more directly than this; specifically, can steps 2 and 3 be omitted or replaced by information related to the Timestamp?

library(tidyverse)
df %>%
  pivot_wider(names_from = Speaker, values_from = Utterance) %>%
  mutate(Index_N = lag(Index_N)) %>%      # can this step be omitted?
  filter(!is.na(Index_N))                 # can this step be omitted?
# A tibble: 3 × 10
  Timestamp                   gest_dur stroke  hold  prep relax prehold nucleus Index_N                       Gesture_N                              
  <chr>                          <int>  <int> <int> <int> <int> <lgl>     <int> <chr>                         <chr>                                  
1 00:04:57.973 - 00:05:00.011     2038   1297    NA    NA    NA NA         1297 ((i: CV=0, SZ=0, FO=0, SL=0)) ((1_m: b h open palms in fingers exten…
2 00:05:00.011 - 00:05:00.924      913    252   169   492    NA NA          421 ((i: CV=0, SZ=0, FO=1, SL=0)) ((2_m: l h rotates palm to the left an…
3 00:05:00.924 - 00:05:01.847      923    247   209   467    NA NA          456 ((i: CV=0, SZ=1, FO=1, SL=0)) ((3_m: b h open palms facing in hands …

Data:

df <- structure(list(Timestamp = c("00:04:57.973 - 00:05:00.011", "00:04:57.973 - 00:05:00.011", 
                                   "00:05:00.011 - 00:05:00.924", "00:05:00.011 - 00:05:00.924", 
                                   "00:05:00.924 - 00:05:01.847", "00:05:00.924 - 00:05:01.847"), 
                     Speaker = c("Index_N", "Gesture_N", "Index_N", "Gesture_N", 
                                 "Index_N", "Gesture_N"), Utterance = c("((i: CV=0, SZ=0, FO=0, SL=0))", 
                                                                        "((1_m: b h open palms in fingers extended @ct))", "((i: CV=0, SZ=0, FO=1, SL=0))", 
                                                                        "((2_m: l h rotates palm to the left and pushes away from lct to lperi))", 
                                                                        "((i: CV=0, SZ=1, FO=1, SL=0))", "((3_m: b h open palms facing in hands projected forwards @ctct))"
                                 ), gest_dur = c(NA, 2038L, NA, 913L, NA, 923L), stroke = c(NA, 
                                                                                            1297L, NA, 252L, NA, 247L), hold = c(NA, NA, NA, 169L, NA, 
                                                                                                                                 209L), prep = c(NA, NA, NA, 492L, NA, 467L), relax = c(NA_integer_, 
                                                                                                                                                                                        NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_
                                                                                                                                 ), prehold = c(NA, NA, NA, NA, NA, NA), nucleus = c(NA, 1297L, 
                                                                                                                                                                                     NA, 421L, NA, 456L)), row.names = c(NA, -6L), class = c("tbl_df", 
                                                                                                                                                                                                                                             "tbl", "data.frame"))

Solution

  • Assuming that Timestamp uniquely identifies pairs you can do:

    library(dplyr)
    library(tidyr)
    
    df |> 
      pivot_wider(names_from = Speaker,
                  values_from = Utterance,
                  id_cols = Timestamp,
                  unused_fn = \(x) first(x, na_rm = TRUE))
    
    # A tibble: 3 × 10
      Timestamp                   Index_N                       Gesture_N                  gest_dur stroke  hold  prep relax prehold nucleus
      <chr>                       <chr>                         <chr>                         <int>  <int> <int> <int> <int> <lgl>     <int>
    1 00:04:57.973 - 00:05:00.011 ((i: CV=0, SZ=0, FO=0, SL=0)) ((1_m: b h open palms in …     2038   1297    NA    NA    NA NA         1297
    2 00:05:00.011 - 00:05:00.924 ((i: CV=0, SZ=0, FO=1, SL=0)) ((2_m: l h rotates palm t…      913    252   169   492    NA NA          421
    3 00:05:00.924 - 00:05:01.847 ((i: CV=0, SZ=1, FO=1, SL=0)) ((3_m: b h open palms fac…      923    247   209   467    NA NA          456
    

    The unused_fn argument applies a summarizing function to all unnamed columns, in this case taking the first non-NA value.