rlisttidyverse

How to create tibble from a nested list, where one part of the list is nested slightly differently for some entries than for others


I am wondering how to create a tibble from a list, where one sublist contains another list of variables for some entries, while the same sublist contains only the variables for other entries. That is a bit vague, so I hope that the following example---which is based on this question (How to create data frame from list, selecting which sublist to focus on)---will help:

library(tidyverse)
listexample = list(books = list(list(
    title="Book 1",
    entry = "entry 1",
    publisher = "Books Unlimited",
    authors = list(
        list(name="bob", location="north dakota",
             subject = list(
                list(area="Fiction", abbrev="FI"),
                list(area="Mystery", abbrev="MY"))),
        list(name="susan", location="california",
             subject = list(area="Fiction", abbrev="FI")),
        list(name="tim",
             subject = list(
                list(area="Fiction", abbrev="FI"),
                list(area="Mystery", abbrev="MY"),
                list(area="Suspense", abbrev="SU")))),
    isbn = "1358",
    universities = list(
        list(univ="univ1"),
        list(univ="univ2"))
),
list(
    title="Book 2",
    entry = "entry 2",
    publisher = "Books Unified",
    authors = list(
        list(name="tom", location="north dakota",
             subject = list(
                        list(area="Fiction", abbrev="FI"),
                        list(area="Mystery", abbrev="MY"))),
        list(name="sally", location="california",
             subject = list(
                list(area="Nonfiction", abbrev="NF"),
                list(area="Biography", abbrev="BIO"))),
        list(name="erica", location="berlin",
              subject = list(area="Fiction", abbrev="FI"))),
    isbn = "1258",
    universities = list(
        list(univ="univ5"),
        list(univ="univ2"),
        list(univ="univ99"),
        list(univ="univ2"),
        list(univ="univ3"))
)   
),
misc = list(name="Jim Smith", location="Alaska"))

(To make the example make a bit more sense, think of an author's subject as the subject they usually write in, even if it differs from the subject of the particular book.)

The issue is that, within their subject list, some authors (bob, tim, tom, and sally) have two (or three) areas (each contained in a separate sublist), while other authors (susan and erica) have only one area (not contained in a separate sublist).

After modifying an answer from the question linked to above, I try:

listexample %>%
    .$books %>%
    tibblify %>%
    select(title, authors) %>%
    unnest(authors) %>% 
    unnest_wider(subject, names_sep="a")

This produces:

# A tibble: 6 × 8
  title  name  location     subjecta1        subjecta2        subjectaarea subjectaabbrev subjecta3       
  <chr>  <chr> <chr>        <list>           <list>           <chr>        <chr>          <list>          
1 Book 1 bob   north dakota <named list [2]> <named list [2]> NA           NA             <NULL>          
2 Book 1 susan california   <NULL>           <NULL>           Fiction      FI             <NULL>          
3 Book 1 tim   NA           <named list [2]> <named list [2]> NA           NA             <named list [2]>
4 Book 2 tom   north dakota <named list [2]> <named list [2]> NA           NA             <NULL>          
5 Book 2 sally california   <named list [2]> <named list [2]> NA           NA             <NULL>          
6 Book 2 erica berlin       <NULL>           <NULL>           Fiction      FI             <NULL> 

The issue is that the expansion shows susan and erica's entries in columns different from for the other people. Let's say that I don't want to include the abbreviation column at all. How can I make it so that there are three columns subject1area, subject2area, and subject3area, so that the result looks like:

  title  name  location     subject1area        subject2area  subject3area     
  <chr>  <chr> <chr>        <chr>               <chr>         <chr>                
1 Book 1 bob   north dakota Fiction             Mystery       NA               
2 Book 1 susan california   Fiction             NA            NA
3 Book 1 tim   NA           Fiction             Mystery       Suspense           
4 Book 2 tom   north dakota Fiction             Mystery       NA               
5 Book 2 sally california   Nonfiction          Biography     NA            
6 Book 2 erica berlin       Fiction             NA            NA     

Solution

  • The attempt in the question is actually quite close. The lines below are the same as there except for adding (1) the mutate line which regularizes the varying depth subject column so that it can be processed uniformly by unnest_wider and the (2) rename_with line which ensures the column names are as in the question.

    listexample %>%
      .$book %>%
      tibblify %>% 
      select(title, authors) %>%
      unnest(authors) %>%
      mutate(subject = lapply(subject, \(x) bind_rows(x)[["area"]])) %>%
      unnest_wider(subject, names_sep = "a") %>%
      rename_with(~ sub("subjecta(\\d+)", "subject\\1area", .x))
    

    giving

    # A tibble: 6 × 6
      title  name  location     subject1area subject2area subject3area
      <chr>  <chr> <chr>        <chr>        <chr>        <chr>       
    1 Book 1 bob   north dakota Fiction      Mystery      <NA>        
    2 Book 1 susan california   Fiction      <NA>         <NA>        
    3 Book 1 tim   <NA>         Fiction      Mystery      Suspense    
    4 Book 2 tom   north dakota Fiction      Mystery      <NA>        
    5 Book 2 sally california   Nonfiction   Biography    <NA>        
    6 Book 2 erica berlin       Fiction      <NA>         <NA>