rrocproc-r-package

Why the 95%CI value calculated by different functions of pROC Package were different?


I'm using the pROC package to calculate the specificity value and the 95%CI for the "best" threshold, my program code is as

data(aSAH)
myroc <- roc(aSAH$outcome, aSAH$s100b)
ci.thresholds(myroc, thresholds = "best")

95% CI (2000 stratified bootstrap replicates):
 thresholds sp.low sp.median sp.high se.low se.median se.high
      0.205 0.7083    0.8056  0.8889 0.4878    0.6341  0.7805

The value I get through the function ci.coords was:

ci.coords(myroc, x = "best", ret = c("specificity"))
95% CI (2000 stratified bootstrap replicates):
 threshold specificity.low specificity.median specificity.high
      best          0.6663             0.8194           0.9865

And the value through the function ci.thresholds was:

ci.thresholds(myroc)
95% CI (2000 stratified bootstrap replicates):
 thresholds  sp.low sp.median sp.high se.low se.median se.high
       -Inf 0.00000    0.0000  0.0000 1.0000    1.0000  1.0000
      0.065 0.06944    0.1389  0.2222 0.9268    0.9756  1.0000
      0.075 0.12500    0.2222  0.3194 0.8049    0.9024  0.9756
      0.085 0.19440    0.3056  0.4167 0.7805    0.8780  0.9756
      0.095 0.27780    0.3889  0.5000 0.7073    0.8293  0.9268
      0.105 0.37500    0.4861  0.5972 0.6579    0.7805  0.9024
      0.115 0.43060    0.5417  0.6528 0.6098    0.7561  0.8780
      0.135 0.47220    0.5833  0.6944 0.5366    0.6829  0.8293
      0.155 0.58330    0.6944  0.7917 0.5122    0.6585  0.8049
      0.205 0.70830    0.8056  0.8889 0.4878    0.6341  0.7805
      0.245 0.72220    0.8194  0.9028 0.4390    0.5854  0.7317
      0.290 0.75000    0.8333  0.9167 0.3659    0.5122  0.6585
      0.325 0.76390    0.8472  0.9306 0.3171    0.4634  0.6098
      0.345 0.79170    0.8750  0.9444 0.2927    0.4390  0.5854
      0.395 0.81910    0.8889  0.9583 0.2683    0.4146  0.5610
      0.435 0.83330    0.9028  0.9583 0.2439    0.3902  0.5366
      0.475 0.90280    0.9583  1.0000 0.1951    0.3415  0.4878
      0.485 0.93060    0.9722  1.0000 0.1707    0.3171  0.4634
      0.510 1.00000    1.0000  1.0000 0.1707    0.2927  0.4390

When the thresholds is 0.205, the value of specificity is 0.8056(ci.thresholds(myroc, thresholds = "best")), but the value through ci.coords(myroc, x = "best", ret = c("specificity")) was 0.8194, in this time, the thresholds is 0.245. Why the value of thresholds is not the same obtained by different functions?

And then, the value of specificity obtained by ci.coords(myroc, x = "best", ret = c("specificity")) was 0.8194, and the 95%CI was 0.6806-0.9861, but the value through ci.thresholds(myroc) was 0.8194, 95%CI: 0.7222-0.9028.

update:

> coords(myroc, x = "best", ret="all", transpose = FALSE)
          threshold specificity sensitivity  accuracy tn tp fn fp       npv  ppv  fdr       fpr       tpr       tnr
threshold     0.205   0.8055556   0.6341463 0.7433628 58 26 15 14 0.7945205 0.65 0.35 0.1944444 0.6341463 0.8055556
                fnr 1-specificity 1-sensitivity 1-accuracy     1-npv 1-ppv precision    recall   youden
threshold 0.3658537     0.1944444     0.3658537  0.2566372 0.2054795  0.35      0.65 0.6341463 1.439702
          closest.topleft
threshold       0.1716575



> ci.coords(myroc, x = "best", ret = "all", transpose = TRUE)
95% CI (2000 stratified bootstrap replicates):
     threshold threshold.low threshold.median threshold.high specificity.low specificity.median specificity.high
best      best          0.12            0.205           0.51          0.6663             0.8194                1
     sensitivity.low sensitivity.median sensitivity.high accuracy.low accuracy.median accuracy.high tn.low tn.median
best          0.3902             0.6341           0.8049       0.6637          0.7522         0.823  47.98        59
     tn.high tp.low tp.median tp.high fn.low fn.median fn.high fp.low fp.median fp.high npv.low npv.median npv.high
best      72     16        26      33      8        15      25      0        13   24.02  0.7273     0.7973   0.8732
     ppv.low ppv.median ppv.high fdr.low fdr.median fdr.high fpr.low fpr.median fpr.high tpr.low tpr.median tpr.high
best  0.5366     0.6667        1       0     0.3333   0.4634       0     0.1806   0.3337  0.3902     0.6341   0.8049
     tnr.low tnr.median tnr.high fnr.low fnr.median fnr.high 1-specificity.low 1-specificity.median 1-specificity.high
best  0.6663     0.8194        1  0.1951     0.3659   0.6098                 0               0.1806             0.3337
     1-sensitivity.low 1-sensitivity.median 1-sensitivity.high 1-accuracy.low 1-accuracy.median 1-accuracy.high
best            0.1951               0.3659             0.6098          0.177            0.2478          0.3363
     1-npv.low 1-npv.median 1-npv.high 1-ppv.low 1-ppv.median 1-ppv.high precision.low precision.median precision.high
best    0.1268       0.2027     0.2727         0       0.3333     0.4634        0.5366           0.6667              1
     recall.low recall.median recall.high youden.low youden.median youden.high closest.topleft.low
best     0.3902        0.6341      0.8049      1.279         1.447        1.61             0.08148
     closest.topleft.median closest.topleft.high
best                 0.1717               0.4021

the specificity was 0.8055556 and 0.8194 respectively for coords and ci.coords, and there are some other different results above.


Solution

  • When you run

    ci.coords(myroc, x = "best" [...]
    

    you are effectively computing the confidence interval of the best threshold itself.

    Internally, pROC resamples the data, determines what is the best threshold on the resampled curve, calculates the coordinates at that threshold, and repeats 2000 times. This is different from setting the threshold to whatever point is best on the full ROC curve and resampling at that given threshold.

    You can see this if you focus on the threshold confidence interval:

    ci.coords(myroc, x = "best", ret = "all", transpose = TRUE)
    95% CI (2000 stratified bootstrap replicates):
         threshold threshold.low threshold.median threshold.high [...]
    best      best          0.12            0.205           0.51
    

    See how the "best" threshold varies around 0.205, between 0.12 and 0.51? As a consequence all the coordinates will have wider confidence intervals too.

    The ci.thresholds function behaves differently, and uses the second option I mentioned above, setting the "best" threshold on the full ROC curve:

    ci.thresholds(myroc, thresholds = "best")
    
    95% CI (2000 stratified bootstrap replicates):
     thresholds 
          0.205
    

    See how there is no confidence interval around the threshold? It is set before resampling. You could get the same behavior with ci.coords if you set x to a numeric threshold (which happens to be the best on the full ROC curve, ie 0.205 here):

    > ci.coords(myroc, x = 0.205)
    95% CI (2000 stratified bootstrap replicates):
          threshold threshold.low threshold.median threshold.high specificity.low specificity.median specificity.high sensitivity.low sensitivity.median sensitivity.high
    0.205     0.205         0.205            0.205          0.205          0.7083             0.8056           0.8889          0.4878             0.6341           0.7805
    

    You can see that the threshold is not resampled (the confidence interval does not vary around the 0.205 value) and the confidence intervals are similar to those obtained with ci.thresholds.

    I realize this could be better documented in ?ci.coords and will aim to do that in a future release.