rfdr

How Does R Calculate the False Discovery Rate


I appear to be getting inconsistent results when I use R's p.adjust function to calculate the False Discovery Rate. Based upon the paper cited in the documentation the adjusted p value should be calculated like this:

adjusted_p_at_index_i= p_at_index_i*(total_number_of_tests/i).

Now when I run p.adjust(c(0.0001, 0.0004, 0.0019),"fdr") I get the expected results of

c(0.0003, 0.0006, 0.0019)

but when I run p.adjust(c(0.517479039, 0.003657195, 0.006080152),"fdr") I get this

c(0.517479039, 0.009120228, 0.009120228)

Instead of the result I calculate:

c(0.517479039, 0.010971585, 0.009120228)

What is R doing to the data that can account for both of these results?


Solution

  • The reason is that the FDR calculation ensures that FDR never increases as the p-value decreases. That's because you can always choose to set a higher threshold for your rejection rule if that higher threshold will get you a lower FDR.

    In your case, your second hypothesis had a p-value of 0.0006 and an FDR of 0.010971585, but the third hypothesis had a larger p-value and a smaller FDR. If you can achieve an FDR of 0.009120228 by setting your p-value threshold to 0.0019, there is never a reason to set a lower threshold just to get a higher FDR.

    You can see this in the code by typing p.adjust:

    ...
    }, BH = {
        i <- lp:1L
        o <- order(p, decreasing = TRUE)
        ro <- order(o)
        pmin(1, cummin(n/i * p[o]))[ro]
    

    The cummin function takes the cumulative minimum of the vector, going backwards in the order of p.

    You can see this in the Benjamini-Hochberg paper you link to, including in the definition of the procedure on page 293, which states (emphasis mine):

    let k be the largest i for which P(i) <= i / m q*;

    then reject all H_(i) i = 1, 2, ..., k