After ROC analysis of a set of data, how to calculate p-value? With the same statistics, I saw that the p-value can be output in SPSS. The sample code is as follows:
library(pROC)
data(aSAH)
head(aSAH)
# gos6 outcome gender age wfns s100b ndka
# 29 5 Good Female 42 1 0.13 3.01
# 30 5 Good Female 37 1 0.14 8.54
# 31 5 Good Female 42 1 0.10 8.09
# 32 5 Good Female 27 1 0.04 10.42
# 33 1 Poor Female 42 3 0.13 17.40
# 34 1 Poor Male 48 2 0.10 12.75
(rr <- roc(aSAH$outcome, aSAH$s100b, plot=T))
# Setting levels: control = Good, case = Poor
# Setting direction: controls < cases
#
# Call:
# roc.default(response = aSAH$outcome, predictor = aSAH$s100b, plot = F)
#
# Data: aSAH$s100b in 72 controls (aSAH$outcome Good) < 41 cases (aSAH$outcome Poor).
# Area under the curve: 0.7314
The p value calculated in SPSS is 0.000007, but the p-value calculated by verification::roc.area()
is 0.000022546, is the calculation method of roc.area()
and SPSS inconsistent?
levels(aSAH$outcome) <- c(0, 1)
library(verification)
ra <- roc.area(as.numeric(as.vector(aSAH$outcome)), rr$predictor)
ra$p.value
# [1] 0.00002254601
There's no option to get the p-value in pROC::roc
, you may set option ci=TRUE
to get confidence intervals instead. pROC::roc
yields an invisible output that you may grab by assigning it to an object.
library(pROC)
data(aSAH)
rr <- pROC::roc(aSAH$outcome, aSAH$s100b, ci=TRUE)
Using str(rr)
reveals how to access the ci
:
rr$ci
# 95% CI: 0.6301-0.8326 (DeLong)
So you have already a confidence interval.
Moreover, you can also get the variance, using pROC::var
*, from which you could calculate a standard error manually.
(v <- var(rr))
# [1] 0.002668682
b <- rr$auc - .5
se <- sqrt(v)
(se <- sqrt(v))
# [1] 0.05165929
* Note, that there is also a bootstrap option pROC::var(rr, method="bootstrap")
.
This is identical to the one calculated by Stata,
# . roctab outcome_num s100b, summary
#
# ROC -Asymptotic Normal--
# Obs Area Std. Err. [95% Conf. Interval]
# ------------------------------------------------------------
# 113 0.7314 0.0517 0.63012 0.83262
# .
# . display r(se)
# .05165929
where Stata Base Reference Manual 14 - roctab
(p. 2329) states:
By default,
roctab
calculates the standard error for the area under the curve by using an algorithm suggested by DeLong, DeLong, and Clarke-Pearson (1988) and asymptotic normal confidence intervals.
Once we have the standard error, we also may calculate a p-value based on the z-distribution (Ref.).
z <- (b / se)
2 * pt(-abs(z), df=Inf) ## two-sided test
# [1] 0.000007508474
This p-value is close to your SPSS value, so it's likely that it's calculated with an algorithm similar to Stata (compare: IBM SPSS Statistics 24 Algorithms, p. 888:889).
However, the calculation of the p value of a ROC analysis might be controversial. E.g. the method you show in your edit (see also first link below) is based on a Mann–Whitney U-statistic.
You might want to dig a little deeper into the subject before you decide which method is best suited for your analysis. I provide you with some reading suggestions here: