rtidyr

How do I pivot_wider() so that duplicates are maintained in their own columns?


I have a long data frame that I want to widen. The long data frame includes duplicates and I want to keep them as unique columns in the wide format--but the wide format output combines the duplicates in a single column-list. I've tried a few things with pivot_wider() and thought unnest() might be what I need. Below is the closest that I've been able to get to what I'm looking for, but it's not quite there.

In the example below, I want the wide data frame to include the values of the variable "surprise" for each "box"--with as many columns as there are unique and non-unique surprises. Conceptually, what I'm after is unique and non-unique surprises in a box.

Is it possible to widen long data in this way?

library(tidyverse)

# some data; long format
long_box <- c("A", "A", "A", "B", "B", "B", "C", "C")
surprise <- c("apple", "orange", "orange", "apple", "banana", "insects", "apple", "insects")

# the data frame I have
tibble(long_box, surprise)

#> # A tibble: 8 x 2
#>   long_box surprise
#>   <chr>    <chr>   
#> 1 A        apple   
#> 2 A        orange  
#> 3 A        orange  
#> 4 B        apple   
#> 5 B        banana  
#> 6 B        insects 
#> 7 C        apple   
#> 8 C        insects

# same data, wide format
wide_box <- c("A", "B", "C")
a <- c(rep("apple",3))
b <- c("orange", "banana", "insects")
c <- c("orange", "insects", NA)

# the data frame format I want
tibble(wide_box, a, b, c) %>% 
  rename(suprise_1 = a,
         suprise_2 = b,
         suprise_3 = c)

#> # A tibble: 3 x 4
#>   wide_box suprise_1 suprise_2 suprise_3
#>   <chr>    <chr>     <chr>     <chr>    
#> 1 A        apple     orange    orange   
#> 2 B        apple     banana    insects  
#> 3 C        apple     insects   <NA>

# this is what I've tried to get from long to wide
tibble(long_box, surprise) %>% 
  pivot_wider(id_cols = long_box,
              names_from = surprise,
              values_from = surprise)

#> Warning: Values are not uniquely identified; output will contain list-cols.
#> * Use `values_fn = list` to suppress this warning.
#> * Use `values_fn = length` to identify where the duplicates arise
#> * Use `values_fn = {summary_fun}` to summarise duplicates
#> # A tibble: 3 x 5
#>   long_box apple     orange    banana    insects  
#>   <chr>    <list>    <list>    <list>    <list>   
#> 1 A        <chr [1]> <chr [2]> <NULL>    <NULL>   
#> 2 B        <chr [1]> <NULL>    <chr [1]> <chr [1]>
#> 3 C        <chr [1]> <NULL>    <NULL>    <chr [1]>

Created on 2021-07-15 by the reprex package (v2.0.0)


Solution

  • Create a sequence column and it should work

    library(dplyr)
    library(tidyr)
    library(data.table)
    library(stringr)
    tibble(long_box, surprise) %>%
         mutate(nm1= str_c('suprise_', rowid(long_box))) %>% 
         pivot_wider(names_from = nm1, values_from = surprise)
    

    -output

    # A tibble: 3 x 4
      long_box suprise_1 suprise_2 suprise_3
      <chr>    <chr>     <chr>     <chr>    
    1 A        apple     orange    orange   
    2 B        apple     banana    insects  
    3 C        apple     insects   <NA>