rdataframelistflattendata-wrangling

Flattening lists within a dataframe within a dataframe, whilst preserving names


I present here an input data frame that contains lists of data frames that contains lists. Some of the bottom level lists are empty and some lists have length greater than one. I am looking for some R code that will turn the input into the output (also given below)

input = structure(list(pet = c("colin", "fred", "roy"),
                       fruit = list(structure(list(apple = "red",
                                                   banana = "yellow", 
                                                   mango = "green"),
                                              class = "data.frame",
                                              row.names = 1L), 
                                    structure(list(apple = "mouldy",
                                                   banana = "bruised",
                                                   mango = "cut"),
                                              class = "data.frame",
                                              row.names = 1L), 
                                    structure(list(apple = c("windfall", "cooking"),
                                                   banana = c("picked", "ripe"),
                                                   mango = c("stolen", "round")),
                                              class = "data.frame",
                                              row.names = 1:2)), 
                       flavours = list(structure(list()),
                                       structure(list(sweet = "very",
                                                      sour = "ouch", 
                                                      spicy = "hot"),
                                                 class = "data.frame",
                                                 row.names = 1L), 
                                       structure(list(sweet = c("sugary", "calories"),
                                                      sour = c("citrus", "lemon"),
                                                      spicy = c("inferno", "burning")),
                                                 class = "data.frame",
                                                 row.names = 1:2))),
                  row.names = c(NA, 3L),
                  class = "data.frame")

output = data.frame(pet = c("colin", "fred", "roy", "roy"),
                    fruit.apple = c("red", "mouldy", "windfall", "cooking"),
                    fruit.banana = c("yellow", "bruised", "picked", "ripe"),
                    fruit.mango = c("green", "cut", "stolen", "round"),
                    flavours.sweet = c(NA, "very", "sugary", "calories"),
                    flavours.sour = c(NA, "ouch", "citrus", "lemon"),
                    flavours.spicy = c(NA, "hot", "inferno", "burning"))

The features must be that the output data frame has column names that are the concatenation of the names already existing in the input, separated by a dot. Where the inner data frame has a list that's empty, this should give rise to NA in the final data frame, and not throw any errors. Where either of the inner data frames has a list with length greater than one, this length will be matched across all lists in that row (it's a design of the input data), and should give rise to two rows in the output with the corresponding values.

I have tried a wide combination of approaches already using such things as rrapply::rrapply(), data.table::'s .SDcol within lapply, unlist(), and every flatten() function that I can find, from purrr:: to jsonlite:: ! None have worked for me so far.

I found some Stack Overflow links (here and here) that came close to what I wanted, but none delivered the correct column names, dealing with empty lists, and lists of greater than length one all at once.

Can you help please? Thank you.


Solution

  • Here's one way to do this with tidyr::unnest

    library(tidyr)
    
    input |>
      unnest_wider(col = c(fruit, flavours),
                   names_sep = '.') |>
      unnest(cols = -pet)
    #> # A tibble: 4 × 7
    #>   pet   fruit.apple fruit.banana fruit.mango flavours.sweet flavours.sour
    #>   <chr> <chr>       <chr>        <chr>       <chr>          <chr>        
    #> 1 colin red         yellow       green       <NA>           <NA>         
    #> 2 fred  mouldy      bruised      cut         very           ouch         
    #> 3 roy   windfall    picked       stolen      sugary         citrus       
    #> 4 roy   cooking     ripe         round       calories       lemon        
    #> # ℹ 1 more variable: flavours.spicy <chr>