I have found that the following works
iris %>%
select(Sepal.Length) %>%
modelr::bootstrap(100) %>%
mutate(mean = map(strap, mean))
but the below does not
iris %>%
select(Sepal.Length) %>%
modelr::bootstrap(100) %>%
mutate(median = map(strap, median))
The only difference is that the second line of code uses the median.
The error I get is
Error in mutate_impl(.data, dots) : Evaluation error: unimplemented type 'list' in 'greater' .
The code looks like it's working, but if you unnest
it, you're actually just getting a lot of NA
s because you're trying to take the mean
of a resample
object, which is a classed list with a reference to the data resampled and the indices for the particular resample. Taking the mean of such a list is not useful, so returning NA
with a warning is helpful behavior. To get the code to work, coerce the resample to a data frame, which you can operate upon as usual within map
's anonymous function.
For a direct route, extract the data and take the mean, simplifying the list to a numeric vector with map_dbl
:
library(tidyverse)
set.seed(47)
iris %>%
select(Sepal.Length) %>%
modelr::bootstrap(100) %>%
mutate(sepal_mean = map_dbl(strap, ~mean(as_data_frame(.x)$Sepal.Length)))
#> # A tibble: 100 x 3
#> strap .id sepal_mean
#> <list> <chr> <dbl>
#> 1 <S3: resample> 001 5.844000
#> 2 <S3: resample> 002 6.016000
#> 3 <S3: resample> 003 5.851333
#> 4 <S3: resample> 004 5.869333
#> 5 <S3: resample> 005 5.840667
#> 6 <S3: resample> 006 5.825333
#> 7 <S3: resample> 007 5.824000
#> 8 <S3: resample> 008 5.790000
#> 9 <S3: resample> 009 5.858000
#> 10 <S3: resample> 010 5.810000
#> # ... with 90 more rows
Translating this approach to median
works fine:
iris %>%
select(Sepal.Length) %>%
modelr::bootstrap(100) %>%
mutate(sepal_median = map_dbl(strap, ~median(as_data_frame(.x)$Sepal.Length)))
#> # A tibble: 100 x 3
#> strap .id sepal_median
#> <list> <chr> <dbl>
#> 1 <S3: resample> 001 5.9
#> 2 <S3: resample> 002 5.8
#> 3 <S3: resample> 003 5.8
#> 4 <S3: resample> 004 5.7
#> 5 <S3: resample> 005 5.7
#> 6 <S3: resample> 006 5.8
#> 7 <S3: resample> 007 5.8
#> 8 <S3: resample> 008 5.7
#> 9 <S3: resample> 009 5.8
#> 10 <S3: resample> 010 5.7
#> # ... with 90 more rows
If you'd like both median and mean, you could repeatedly coerce the resample to a data frame, or store it in another column, but neither approach is very efficient. It's better to return a list of data frames with map
that can be unnest
ed:
iris %>%
select(Sepal.Length) %>%
modelr::bootstrap(100) %>%
mutate(stats = map(strap, ~summarise_all(as_data_frame(.x), funs(mean, median)))) %>%
unnest(stats)
#> # A tibble: 100 x 4
#> strap .id mean median
#> <list> <chr> <dbl> <dbl>
#> 1 <S3: resample> 001 5.744667 5.60
#> 2 <S3: resample> 002 5.725333 5.70
#> 3 <S3: resample> 003 5.808667 5.70
#> 4 <S3: resample> 004 5.809333 5.70
#> 5 <S3: resample> 005 5.964000 5.85
#> 6 <S3: resample> 006 5.931333 5.95
#> 7 <S3: resample> 007 5.838667 5.80
#> 8 <S3: resample> 008 5.926000 5.95
#> 9 <S3: resample> 009 5.855333 5.75
#> 10 <S3: resample> 010 5.888667 5.70
#> # ... with 90 more rows