rruntime-errorstatistics-bootstrapproc-r-package

R: Error "incorrect number of subscripts on matrix" when trying boot with roc


I am using Rstudio, and trying to use roc from package pROC with boot for bootstrapping. I am following the code on this link. Code from that link uses another function with boot which works fine. But when I try roc, it gives error.

Below is my code: (In the output I am printing the dimensions of the sample to see how many times re-sampling is done. Here R=5, sampling is done 6 times and then error occurs).

library(boot)

roc_boot <- function(D, d) {
  E=D[d,]
  print(dim(E))
  return(roc(E$x,E$y))
}

x = round(runif(100))
y = runif(100)
D = data.frame(x, y)

b = boot(D, roc_boot, R=5)

Output:

[1] 100   2
[1] 100   2
[1] 100   2
[1] 100   2
[1] 100   2
[1] 100   2
Error in boot(D, roc_boot, R = 5) : 
  incorrect number of subscripts on matrix

What is the problem here?

If I replace roc with some other function like sum, then it works perfectly (it prints the 6 lines without any error). It also gives different answers when booted multiple times (while keeping D same).

Please notice that the error is occurring after all the re-sampling is done. I cannot find the source of this particular error. I have looked at other answers like this but they don't seem to apply on my case. Can someone also explain why this error occurs and what it means, generally?

EDIT: I returned only area under curve using following function:

roc_boot <- function(D, d) {
  E=D[d,]
  objectROC <- roc(E$x,E$y)
  return(objectROC$auc)
}

This gives an answer of area under the curve but it is same as the answer without bootstrapping, meaning there is no improvement. I need to pass the whole roc object to have improvement because of bootstrapping.


Solution

  • Turns out, you can't return roc object from the function statistic in boot. It has to be a numeric value. So the following modification gets rid of the error (as edited in the questions)

    roc_boot <- function(D, d) {
      E=D[d,]
      objectROC <- roc(E$x,E$y)
      return(objectROC$auc)
    }
    

    Moreover, As suggested by @Calimo, boot only improves the confidence interval and not the actual answer. In my case, there is a slight improvement in confidence interval.