I use tidyr::unnest
frequently. But I don't use nest
; I can't figure out what problem it solves. The nest documentation give examples like
as_tibble(iris) %>% nest(-Species)
But I don't see what to do with the result, except to immediately apply unnest
to it and get iris
back. Anything else I think of - like inner_join
ing it - I could do just as well if I'd group_by
ed it instead. I've looked at other SO posts which used nest
, e.g. Irregular nest tidyverse, but they didn't edify.
nest
- what problem is it solving? Can you give me examples of a problem which is most straightforwardly solved using nest
?
The example code as_tibble(iris) %>% nest(-Species)
now (tidyr 1.0.2
) gives a warning. What's the new, right way to invoke it without listing every included column? as_tibble(iris) %>% nest(-Species, cols = everything())
didn't work.
Great question!
Nest is made to solve problems where we want to apply a function that takes a complex structure as an input, a very good example that I can think of is the lm function, as demonstrated on the excelent book r4ds https://r4ds.had.co.nz/many-models.html#gapminder
There is also this new function on the tidyverse called nest_by, I showed how to replace the old nest code, but both are super useful on the right context
library(tidyverse)
library(gapminder)
by_country <- gapminder %>%
group_by(country, continent) %>%
nest()
by_country
#> # A tibble: 142 x 3
#> # Groups: country, continent [142]
#> country continent data
#> <fct> <fct> <list>
#> 1 Afghanistan Asia <tibble [12 x 4]>
#> 2 Albania Europe <tibble [12 x 4]>
#> 3 Algeria Africa <tibble [12 x 4]>
#> 4 Angola Africa <tibble [12 x 4]>
#> 5 Argentina Americas <tibble [12 x 4]>
#> 6 Australia Oceania <tibble [12 x 4]>
#> 7 Austria Europe <tibble [12 x 4]>
#> 8 Bahrain Asia <tibble [12 x 4]>
#> 9 Bangladesh Asia <tibble [12 x 4]>
#> 10 Belgium Europe <tibble [12 x 4]>
#> # ... with 132 more rows
country_model <- function(df) {
lm(lifeExp ~ year, data = df)
}
by_country <- by_country %>%
mutate(model = map(data, country_model))
by_country
#> # A tibble: 142 x 4
#> # Groups: country, continent [142]
#> country continent data model
#> <fct> <fct> <list> <list>
#> 1 Afghanistan Asia <tibble [12 x 4]> <lm>
#> 2 Albania Europe <tibble [12 x 4]> <lm>
#> 3 Algeria Africa <tibble [12 x 4]> <lm>
#> 4 Angola Africa <tibble [12 x 4]> <lm>
#> 5 Argentina Americas <tibble [12 x 4]> <lm>
#> 6 Australia Oceania <tibble [12 x 4]> <lm>
#> 7 Austria Europe <tibble [12 x 4]> <lm>
#> 8 Bahrain Asia <tibble [12 x 4]> <lm>
#> 9 Bangladesh Asia <tibble [12 x 4]> <lm>
#> 10 Belgium Europe <tibble [12 x 4]> <lm>
#> # ... with 132 more rows
# The new way is using nest_by
by_country_new <- gapminder %>%
nest_by(country,continent) %>%
mutate(model = list(country_model(data)))
by_country_new
#> # A tibble: 142 x 4
#> # Rowwise: country, continent
#> country continent data model
#> <fct> <fct> <list<tbl_df[,4]>> <list>
#> 1 Afghanistan Asia [12 x 4] <lm>
#> 2 Albania Europe [12 x 4] <lm>
#> 3 Algeria Africa [12 x 4] <lm>
#> 4 Angola Africa [12 x 4] <lm>
#> 5 Argentina Americas [12 x 4] <lm>
#> 6 Australia Oceania [12 x 4] <lm>
#> 7 Austria Europe [12 x 4] <lm>
#> 8 Bahrain Asia [12 x 4] <lm>
#> 9 Bangladesh Asia [12 x 4] <lm>
#> 10 Belgium Europe [12 x 4] <lm>
#> # ... with 132 more rows
Created on 2020-06-07 by the reprex package (v0.3.0)
Also here is the new way to nest Species on the iris dataset
library(tidyverse)
iris %>%
group_by(Species) %>%
nest()
#> # A tibble: 3 x 2
#> # Groups: Species [3]
#> Species data
#> <fct> <list>
#> 1 setosa <tibble [50 x 4]>
#> 2 versicolor <tibble [50 x 4]>
#> 3 virginica <tibble [50 x 4]>
Created on 2020-06-07 by the reprex package (v0.3.0)