pythonscipystatisticsbinomial-cdf

Difference in binom.cdf and binom_test in python


I am running a binomial test and I cannot understand why these two methods have different results. The probs from the second to the first are different. When we calculate the two-tailed p-value should we just double one of the tails?

from scipy.stats import binom

n, p = 50, 0.4
prob = binom.cdf(2, n, p)
first = 2*prob

from scipy import stats

second = stats.binom_test(2, n, p, alternative='two-sided')

Solution

  • When we calculate the two-tailed p-value should we just double one of the tails?

    No, because the binomial distribution is not, in general, symmetric. The one case where your calculation would work is p = 0.5.

    Here's a visualization for the two-sided binomial test. For this demonstration, I'll use n=14 instead of n=50 to make the plot clearer.

    Plot of binomial distribution

    The dashed line is drawn at the height of binom.pmf(2, n, p). The probabilities that contribute to the two-sided binomial test binom_test(2, n, p, alternative='two-sided') are those that are less than or equal to this value. In this example, we can see that the values of k where this is true are [0, 1, 2] (which is the left tail) and [10, 11, 12, 13, 14] (which is the right tail). The p-value of the two-sided binomial test should be the sum of these probabilities. And that is, in fact, what we find:

    In [20]: binom.pmf([0, 1, 2, 10, 11, 12, 13, 14], n, p).sum()
    Out[20]: 0.05730112258048004
    
    In [21]: binom_test(2, n, p)
    Out[21]: 0.05730112258047999
    

    Note that scipy.stats.binom_test is deprecated. Users of SciPy 1.7.0 or later should use scipy.stats.binomtest instead:

    In [36]: from scipy.stats import binomtest
    
    In [37]: result = binomtest(2, n, p)
    
    In [38]: result.pvalue
    Out[38]: 0.05730112258047999
    

    Here's the script to generate the plot:

    import numpy as np
    from scipy.stats import binom
    import matplotlib.pyplot as plt
    
    
    n = 14
    p = 0.4
    
    k = np.arange(0, n+1)
    
    plt.plot(k, binom.pmf(k, n, p), 'o')
    plt.xlabel('k')
    plt.ylabel('pmf(k, n, p)')
    plt.title(f'Binomial distribution with {n=}, {p=}')
    ax = plt.gca()
    ax.set_xticks(k[::2])
    
    pmf2 = binom.pmf(2, n, p)
    plt.axhline(pmf2, linestyle='--', color='k', alpha=0.5)
    
    plt.grid(True)
    plt.show()