I have a long data frame that I want to widen. The long data frame includes duplicates and I want to keep them as unique columns in the wide format--but the wide format output combines the duplicates in a single column-list. I've tried a few things with pivot_wider() and thought unnest() might be what I need. Below is the closest that I've been able to get to what I'm looking for, but it's not quite there.
In the example below, I want the wide data frame to include the values of the variable "surprise" for each "box"--with as many columns as there are unique and non-unique surprises. Conceptually, what I'm after is unique and non-unique surprises in a box.
Is it possible to widen long data in this way?
library(tidyverse)
# some data; long format
long_box <- c("A", "A", "A", "B", "B", "B", "C", "C")
surprise <- c("apple", "orange", "orange", "apple", "banana", "insects", "apple", "insects")
# the data frame I have
tibble(long_box, surprise)
#> # A tibble: 8 x 2
#> long_box surprise
#> <chr> <chr>
#> 1 A apple
#> 2 A orange
#> 3 A orange
#> 4 B apple
#> 5 B banana
#> 6 B insects
#> 7 C apple
#> 8 C insects
# same data, wide format
wide_box <- c("A", "B", "C")
a <- c(rep("apple",3))
b <- c("orange", "banana", "insects")
c <- c("orange", "insects", NA)
# the data frame format I want
tibble(wide_box, a, b, c) %>%
rename(suprise_1 = a,
suprise_2 = b,
suprise_3 = c)
#> # A tibble: 3 x 4
#> wide_box suprise_1 suprise_2 suprise_3
#> <chr> <chr> <chr> <chr>
#> 1 A apple orange orange
#> 2 B apple banana insects
#> 3 C apple insects <NA>
# this is what I've tried to get from long to wide
tibble(long_box, surprise) %>%
pivot_wider(id_cols = long_box,
names_from = surprise,
values_from = surprise)
#> Warning: Values are not uniquely identified; output will contain list-cols.
#> * Use `values_fn = list` to suppress this warning.
#> * Use `values_fn = length` to identify where the duplicates arise
#> * Use `values_fn = {summary_fun}` to summarise duplicates
#> # A tibble: 3 x 5
#> long_box apple orange banana insects
#> <chr> <list> <list> <list> <list>
#> 1 A <chr [1]> <chr [2]> <NULL> <NULL>
#> 2 B <chr [1]> <NULL> <chr [1]> <chr [1]>
#> 3 C <chr [1]> <NULL> <NULL> <chr [1]>
Created on 2021-07-15 by the reprex package (v2.0.0)
Create a sequence column and it should work
library(dplyr)
library(tidyr)
library(data.table)
library(stringr)
tibble(long_box, surprise) %>%
mutate(nm1= str_c('suprise_', rowid(long_box))) %>%
pivot_wider(names_from = nm1, values_from = surprise)
-output
# A tibble: 3 x 4
long_box suprise_1 suprise_2 suprise_3
<chr> <chr> <chr> <chr>
1 A apple orange orange
2 B apple banana insects
3 C apple insects <NA>