statistics-bootstrapaucproc-r-package

How to find the number of samples that are picked in each boostrap of stratified bootstrap in pROC?


Question is regarding the roc function of pROC package. Package link: https://www.rdocumentation.org/packages/pROC/versions/1.18.5/topics/roc. Paper link https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3068975/.

I am plotting confidence intervals on my ROC plot:

pROC::roc(as.factor(df$classes),
                   df$score ,
                   plot=TRUE,
                   ci = TRUE,
                   ci.method= 'bootstrap',
                   auc.polygon=TRUE,
                   max.auc.polygon=TRUE,
                   print.auc=TRUE)

I understand from documentation that when using the ci.method as 'bootstrap', stratified bootstrapping takes place. How to find the subsampling percentage used while bootstrapping? Is it 80% of total data, 70% or something else? Can we specify it?

The paper quotes, "Bootstrap is stratified by default; in this case the same number of case and control observations than in the original sample will be selected in each bootstrap replicate.". I think they meant same proportion of case and control observations in each replicate. However, what percentage of subsampling takes place is not mentioned anywhere.

If I am interpretting it wrong please correct me.


Solution

  • Bootstrapping by definition is resampling of the whole data with replacement.

    If ci is TRUE and ci.method is "bootstrap", then the roc function eventually calls stratified.ci.auc, which looks like this:

    pROC:::stratified.ci.auc
    

    function (n, roc) 
    {
        controls <- sample(roc$controls, replace = TRUE)
        cases <- sample(roc$cases, replace = TRUE)
        thresholds <- roc_utils_thresholds(c(cases, controls), roc$direction)
        perfs <- roc$fun.sesp(thresholds = thresholds, controls = controls, 
            cases = cases, direction = roc$direction)
        roc$sensitivities <- perfs$se
        roc$specificities <- perfs$sp
        auc.roc(roc, partial.auc = attr(roc$auc, "partial.auc"), 
            partial.auc.focus = attr(roc$auc, "partial.auc.focus"), 
            partial.auc.correct = attr(roc$auc, "partial.auc.correct"), 
            allow.invalid.partial.auc.correct = TRUE)
    }
    

    The first two lines tell you that the stratified samples use 100% of the cases and controls because no size argument is given.


    > sample(1:10, replace=TRUE)
    # [1]  1 10  6  2  7  3  3  6  4  6
    

    Note that n in the function above is the number of bootstrap samples, not the size of the samples. The default is 2000 (boot.n).