I have distinct pairwise data points in column Utterance
; the data pairs have identical Timestamp
s. I'd like to pivot the distinct Utterance
pairs wider and align them on the same row with the same Timestamp
. I can do it, though I'm wondering if there's a way to do it more elegantly, more directly than this; specifically, can steps 2 and 3 be omitted or replaced by information related to the Timestamp
?
library(tidyverse)
df %>%
pivot_wider(names_from = Speaker, values_from = Utterance) %>%
mutate(Index_N = lag(Index_N)) %>% # can this step be omitted?
filter(!is.na(Index_N)) # can this step be omitted?
# A tibble: 3 × 10
Timestamp gest_dur stroke hold prep relax prehold nucleus Index_N Gesture_N
<chr> <int> <int> <int> <int> <int> <lgl> <int> <chr> <chr>
1 00:04:57.973 - 00:05:00.011 2038 1297 NA NA NA NA 1297 ((i: CV=0, SZ=0, FO=0, SL=0)) ((1_m: b h open palms in fingers exten…
2 00:05:00.011 - 00:05:00.924 913 252 169 492 NA NA 421 ((i: CV=0, SZ=0, FO=1, SL=0)) ((2_m: l h rotates palm to the left an…
3 00:05:00.924 - 00:05:01.847 923 247 209 467 NA NA 456 ((i: CV=0, SZ=1, FO=1, SL=0)) ((3_m: b h open palms facing in hands …
Data:
df <- structure(list(Timestamp = c("00:04:57.973 - 00:05:00.011", "00:04:57.973 - 00:05:00.011",
"00:05:00.011 - 00:05:00.924", "00:05:00.011 - 00:05:00.924",
"00:05:00.924 - 00:05:01.847", "00:05:00.924 - 00:05:01.847"),
Speaker = c("Index_N", "Gesture_N", "Index_N", "Gesture_N",
"Index_N", "Gesture_N"), Utterance = c("((i: CV=0, SZ=0, FO=0, SL=0))",
"((1_m: b h open palms in fingers extended @ct))", "((i: CV=0, SZ=0, FO=1, SL=0))",
"((2_m: l h rotates palm to the left and pushes away from lct to lperi))",
"((i: CV=0, SZ=1, FO=1, SL=0))", "((3_m: b h open palms facing in hands projected forwards @ctct))"
), gest_dur = c(NA, 2038L, NA, 913L, NA, 923L), stroke = c(NA,
1297L, NA, 252L, NA, 247L), hold = c(NA, NA, NA, 169L, NA,
209L), prep = c(NA, NA, NA, 492L, NA, 467L), relax = c(NA_integer_,
NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_
), prehold = c(NA, NA, NA, NA, NA, NA), nucleus = c(NA, 1297L,
NA, 421L, NA, 456L)), row.names = c(NA, -6L), class = c("tbl_df",
"tbl", "data.frame"))
Assuming that Timestamp
uniquely identifies pairs you can do:
library(dplyr)
library(tidyr)
df |>
pivot_wider(names_from = Speaker,
values_from = Utterance,
id_cols = Timestamp,
unused_fn = \(x) first(x, na_rm = TRUE))
# A tibble: 3 × 10
Timestamp Index_N Gesture_N gest_dur stroke hold prep relax prehold nucleus
<chr> <chr> <chr> <int> <int> <int> <int> <int> <lgl> <int>
1 00:04:57.973 - 00:05:00.011 ((i: CV=0, SZ=0, FO=0, SL=0)) ((1_m: b h open palms in … 2038 1297 NA NA NA NA 1297
2 00:05:00.011 - 00:05:00.924 ((i: CV=0, SZ=0, FO=1, SL=0)) ((2_m: l h rotates palm t… 913 252 169 492 NA NA 421
3 00:05:00.924 - 00:05:01.847 ((i: CV=0, SZ=1, FO=1, SL=0)) ((3_m: b h open palms fac… 923 247 209 467 NA NA 456
The unused_fn
argument applies a summarizing function to all unnamed columns, in this case taking the first non-NA value.