I am wondering how to create a tibble from a list, where one sublist contains another list of variables for some entries, while the same sublist contains only the variables for other entries. That is a bit vague, so I hope that the following example---which is based on this question (How to create data frame from list, selecting which sublist to focus on)---will help:
library(tidyverse)
listexample = list(books = list(list(
title="Book 1",
entry = "entry 1",
publisher = "Books Unlimited",
authors = list(
list(name="bob", location="north dakota",
subject = list(
list(area="Fiction", abbrev="FI"),
list(area="Mystery", abbrev="MY"))),
list(name="susan", location="california",
subject = list(area="Fiction", abbrev="FI")),
list(name="tim",
subject = list(
list(area="Fiction", abbrev="FI"),
list(area="Mystery", abbrev="MY"),
list(area="Suspense", abbrev="SU")))),
isbn = "1358",
universities = list(
list(univ="univ1"),
list(univ="univ2"))
),
list(
title="Book 2",
entry = "entry 2",
publisher = "Books Unified",
authors = list(
list(name="tom", location="north dakota",
subject = list(
list(area="Fiction", abbrev="FI"),
list(area="Mystery", abbrev="MY"))),
list(name="sally", location="california",
subject = list(
list(area="Nonfiction", abbrev="NF"),
list(area="Biography", abbrev="BIO"))),
list(name="erica", location="berlin",
subject = list(area="Fiction", abbrev="FI"))),
isbn = "1258",
universities = list(
list(univ="univ5"),
list(univ="univ2"),
list(univ="univ99"),
list(univ="univ2"),
list(univ="univ3"))
)
),
misc = list(name="Jim Smith", location="Alaska"))
(To make the example make a bit more sense, think of an author's subject
as the subject they usually write in, even if it differs from the subject of the particular book.)
The issue is that, within their subject
list, some authors (bob, tim, tom, and sally) have two (or three) area
s (each contained in a separate sublist), while other authors (susan and erica) have only one area
(not contained in a separate sublist).
After modifying an answer from the question linked to above, I try:
listexample %>%
.$books %>%
tibblify %>%
select(title, authors) %>%
unnest(authors) %>%
unnest_wider(subject, names_sep="a")
This produces:
# A tibble: 6 × 8
title name location subjecta1 subjecta2 subjectaarea subjectaabbrev subjecta3
<chr> <chr> <chr> <list> <list> <chr> <chr> <list>
1 Book 1 bob north dakota <named list [2]> <named list [2]> NA NA <NULL>
2 Book 1 susan california <NULL> <NULL> Fiction FI <NULL>
3 Book 1 tim NA <named list [2]> <named list [2]> NA NA <named list [2]>
4 Book 2 tom north dakota <named list [2]> <named list [2]> NA NA <NULL>
5 Book 2 sally california <named list [2]> <named list [2]> NA NA <NULL>
6 Book 2 erica berlin <NULL> <NULL> Fiction FI <NULL>
The issue is that the expansion shows susan and erica's entries in columns different from for the other people. Let's say that I don't want to include the abbreviation column at all. How can I make it so that there are three columns subject1area
, subject2area
, and subject3area
, so that the result looks like:
title name location subject1area subject2area subject3area
<chr> <chr> <chr> <chr> <chr> <chr>
1 Book 1 bob north dakota Fiction Mystery NA
2 Book 1 susan california Fiction NA NA
3 Book 1 tim NA Fiction Mystery Suspense
4 Book 2 tom north dakota Fiction Mystery NA
5 Book 2 sally california Nonfiction Biography NA
6 Book 2 erica berlin Fiction NA NA
The attempt in the question is actually quite close. The lines below are the same as there except for adding (1) the mutate
line which regularizes the varying depth subject
column so that it can be processed uniformly by unnest_wider
and the (2) rename_with
line which ensures the column names are as in the question.
listexample %>%
.$book %>%
tibblify %>%
select(title, authors) %>%
unnest(authors) %>%
mutate(subject = lapply(subject, \(x) bind_rows(x)[["area"]])) %>%
unnest_wider(subject, names_sep = "a") %>%
rename_with(~ sub("subjecta(\\d+)", "subject\\1area", .x))
giving
# A tibble: 6 × 6
title name location subject1area subject2area subject3area
<chr> <chr> <chr> <chr> <chr> <chr>
1 Book 1 bob north dakota Fiction Mystery <NA>
2 Book 1 susan california Fiction <NA> <NA>
3 Book 1 tim <NA> Fiction Mystery Suspense
4 Book 2 tom north dakota Fiction Mystery <NA>
5 Book 2 sally california Nonfiction Biography <NA>
6 Book 2 erica berlin Fiction <NA> <NA>