pythonstatisticsanalysiserror-correctionfalse-positive

Bonferroni correction of p-values from hypergeometric analysis


I have performed a hypergeometric analysis (using a python script) to investigate enrichment of GO-terms in a subset of genes. An example of my output is as follows:

GO00001 1500    300 200 150 5.39198144708e-77
GO00002 1500    500 400 350 1.18917839281e-160
GO00003 1500    400 350 320 9.48402847878e-209
GO00004 1500    100 100 75  3.82935778527e-82
GO00005 1500    100 80  80  2.67977253966e-114

where

Column1 = GO ID
Column2 = Total sum of all terms in the original dataset
Column3 = Total sum of [Column 1] IDs in the original dataset
Column4 = Sum of all terms in the subset
Column5 = Sum of [Column 1] IDs in subset
Column6 = pvalue derived from hypergeometric test

I know that I must multiply the number of experiments by the pvalue but I'm not sure how to do this with the data I have. Am I calculating from the subset or a combination of the original dataset and the subset? For example, would it be:

Column2 * Column5 * pvalue
Column3 * Column5 * pvalue
Column4 * Column5 * pvalue

I apologise if this seems like a stupid question but I just can't seem to get my head around it. Many thanks in advance!


Solution

  • from statsmodels.sandbox.stats.multicomp import multipletests
    p_adjusted = multipletests(Column6, method='bonferroni')
    

    Or am I missing something?..