I appear to be getting inconsistent results when I use R's p.adjust function to calculate the False Discovery Rate. Based upon the paper cited in the documentation the adjusted p value should be calculated like this:
adjusted_p_at_index_i= p_at_index_i*(total_number_of_tests/i).
Now when I run p.adjust(c(0.0001, 0.0004, 0.0019),"fdr")
I get the expected results of
c(0.0003, 0.0006, 0.0019)
but when I run p.adjust(c(0.517479039, 0.003657195, 0.006080152),"fdr")
I get this
c(0.517479039, 0.009120228, 0.009120228)
Instead of the result I calculate:
c(0.517479039, 0.010971585, 0.009120228)
What is R doing to the data that can account for both of these results?
The reason is that the FDR calculation ensures that FDR never increases as the p-value decreases. That's because you can always choose to set a higher threshold for your rejection rule if that higher threshold will get you a lower FDR.
In your case, your second hypothesis had a p-value of 0.0006
and an FDR of 0.010971585
, but the third hypothesis had a larger p-value and a smaller FDR. If you can achieve an FDR of 0.009120228
by setting your p-value threshold to 0.0019
, there is never a reason to set a lower threshold just to get a higher FDR.
You can see this in the code by typing p.adjust
:
...
}, BH = {
i <- lp:1L
o <- order(p, decreasing = TRUE)
ro <- order(o)
pmin(1, cummin(n/i * p[o]))[ro]
The cummin
function takes the cumulative minimum of the vector, going backwards in the order of p
.
You can see this in the Benjamini-Hochberg paper you link to, including in the definition of the procedure on page 293, which states (emphasis mine):
let k be the largest i for which P(i) <= i / m q*;
then reject all H_(i) i = 1, 2, ..., k