rtidyr

tidyr::nest - what is it good for?


I use tidyr::unnest frequently. But I don't use nest; I can't figure out what problem it solves. The nest documentation give examples like

as_tibble(iris) %>% nest(-Species)

But I don't see what to do with the result, except to immediately apply unnest to it and get iris back. Anything else I think of - like inner_joining it - I could do just as well if I'd group_byed it instead. I've looked at other SO posts which used nest, e.g. Irregular nest tidyverse, but they didn't edify.

nest - what problem is it solving? Can you give me examples of a problem which is most straightforwardly solved using nest?

PS

The example code as_tibble(iris) %>% nest(-Species) now (tidyr 1.0.2) gives a warning. What's the new, right way to invoke it without listing every included column? as_tibble(iris) %>% nest(-Species, cols = everything()) didn't work.


Solution

  • Great question!

    Nest is made to solve problems where we want to apply a function that takes a complex structure as an input, a very good example that I can think of is the lm function, as demonstrated on the excelent book r4ds https://r4ds.had.co.nz/many-models.html#gapminder

    There is also this new function on the tidyverse called nest_by, I showed how to replace the old nest code, but both are super useful on the right context

    library(tidyverse)
    library(gapminder)
    
    by_country <- gapminder %>% 
      group_by(country, continent) %>% 
      nest()
    
    by_country
    #> # A tibble: 142 x 3
    #> # Groups:   country, continent [142]
    #>    country     continent data             
    #>    <fct>       <fct>     <list>           
    #>  1 Afghanistan Asia      <tibble [12 x 4]>
    #>  2 Albania     Europe    <tibble [12 x 4]>
    #>  3 Algeria     Africa    <tibble [12 x 4]>
    #>  4 Angola      Africa    <tibble [12 x 4]>
    #>  5 Argentina   Americas  <tibble [12 x 4]>
    #>  6 Australia   Oceania   <tibble [12 x 4]>
    #>  7 Austria     Europe    <tibble [12 x 4]>
    #>  8 Bahrain     Asia      <tibble [12 x 4]>
    #>  9 Bangladesh  Asia      <tibble [12 x 4]>
    #> 10 Belgium     Europe    <tibble [12 x 4]>
    #> # ... with 132 more rows
    
    
    country_model <- function(df) {
      lm(lifeExp ~ year, data = df)
    }
    
    
    by_country <- by_country %>% 
      mutate(model = map(data, country_model))
    by_country
    #> # A tibble: 142 x 4
    #> # Groups:   country, continent [142]
    #>    country     continent data              model 
    #>    <fct>       <fct>     <list>            <list>
    #>  1 Afghanistan Asia      <tibble [12 x 4]> <lm>  
    #>  2 Albania     Europe    <tibble [12 x 4]> <lm>  
    #>  3 Algeria     Africa    <tibble [12 x 4]> <lm>  
    #>  4 Angola      Africa    <tibble [12 x 4]> <lm>  
    #>  5 Argentina   Americas  <tibble [12 x 4]> <lm>  
    #>  6 Australia   Oceania   <tibble [12 x 4]> <lm>  
    #>  7 Austria     Europe    <tibble [12 x 4]> <lm>  
    #>  8 Bahrain     Asia      <tibble [12 x 4]> <lm>  
    #>  9 Bangladesh  Asia      <tibble [12 x 4]> <lm>  
    #> 10 Belgium     Europe    <tibble [12 x 4]> <lm>  
    #> # ... with 132 more rows
    
    
    # The new way is using nest_by
    
    by_country_new <- gapminder %>% 
      nest_by(country,continent) %>% 
      mutate(model = list(country_model(data)))
    
    by_country_new
    #> # A tibble: 142 x 4
    #> # Rowwise:  country, continent
    #>    country     continent               data model 
    #>    <fct>       <fct>     <list<tbl_df[,4]>> <list>
    #>  1 Afghanistan Asia                [12 x 4] <lm>  
    #>  2 Albania     Europe              [12 x 4] <lm>  
    #>  3 Algeria     Africa              [12 x 4] <lm>  
    #>  4 Angola      Africa              [12 x 4] <lm>  
    #>  5 Argentina   Americas            [12 x 4] <lm>  
    #>  6 Australia   Oceania             [12 x 4] <lm>  
    #>  7 Austria     Europe              [12 x 4] <lm>  
    #>  8 Bahrain     Asia                [12 x 4] <lm>  
    #>  9 Bangladesh  Asia                [12 x 4] <lm>  
    #> 10 Belgium     Europe              [12 x 4] <lm>  
    #> # ... with 132 more rows
    

    Created on 2020-06-07 by the reprex package (v0.3.0)

    Also here is the new way to nest Species on the iris dataset

    library(tidyverse)
    
    
    iris %>%
      group_by(Species) %>% 
      nest()
    #> # A tibble: 3 x 2
    #> # Groups:   Species [3]
    #>   Species    data             
    #>   <fct>      <list>           
    #> 1 setosa     <tibble [50 x 4]>
    #> 2 versicolor <tibble [50 x 4]>
    #> 3 virginica  <tibble [50 x 4]>
    

    Created on 2020-06-07 by the reprex package (v0.3.0)