I used rsample::bootstraps function to create a nested object just as follows :
Sampled_Data=bootstraps(credit_data,times = 2,strata="Home",apparent = TRUE)
What I get is as follows :
splits id
<list> <chr>
1 <split [34338/12635]> Bootstrap1
2 <split [34338/12592]> Bootstrap2
3 <split [34338/34338]> Apparent
I would like to compute the Gini Index based on Columns "Status" and "Expenses" for all the bootstrapped dataframes just like this :
library(pROC)
2*auc(credit_data$Status,credit_data$Expenses)-1
The problem is that i don't know how to do it without unnesting and doing a for loop.
It seems that purr package should be interesting to be used here but I'm not familiar with this.
What I would like to have :
splits id Gini
<list> <chr>
1 <split [34338/12635]> Bootstrap1 x
2 <split [34338/12592]> Bootstrap2 y
3 <split [34338/34338]> Apparent z
Any help ?
Thanks
I'll assume that you want to bootstrap this to get confidence intervals.
You would use apparent = TRUE
for some types of intervals, so I'll omit that here.
library(tidymodels)
tidymodels_prefer()
data("credit_data")
# See ?int_pctl and
# https://www.tidymodels.org/learn/statistics/bootstrap
# for more info.
get_gini <- function(split) {
dat <- analysis(split)
roc_res <- roc_auc(dat, truth = Status, Expenses)
# Convert to gini stat
roc_res %>%
mutate(
.metric = "gini",
.estimate = 2 * .estimate - 1
) %>%
# now use same fomrat as `tidy()`
select(estimate = .estimate, term = .metric)
}
set.seed(1)
# Set times higher for bootstrap intervals
bts <-
bootstraps(credit_data, times = 50) %>%
mutate(gini = map(splits, get_gini))
int_pctl(bts, gini)
#> Warning: Recommend at least 1000 non-missing bootstrap resamples for term
#> `gini`.
#> # A tibble: 1 × 6
#> term .lower .estimate .upper .alpha .method
#> <chr> <dbl> <dbl> <dbl> <dbl> <chr>
#> 1 gini -0.0463 -0.00173 0.0377 0.05 percentile
Created on 2023-07-17 with reprex v2.0.2