pythonmachine-learningstatisticschi-squared

p value generated by scipy.stats.chi2_contingency for independence testing


For testing if two features are independent or not, H0: A and B are independent H1: A and B are dependent

p < 0.05, then A and B are dependent

Upon trying the following code, where it is very clear that the two arrays are dependent(they are the same arrays)

obs = np.array([[10, 10, 10], [10, 10, 10]])
scipy.stats.chi2_contingency(obs)

I get the following result:

(0.0, 1.0, 2, array([[10., 10., 10.],
        [10., 10., 10.]]))

i.e. p value is 1.0 > 0.05, So we accept the null hypothesis that the two arrays are independent of each other.

Is there an assumption I got wrong or is it generating 1-p values?


Solution

  • The computation you get is correct. It only means that the variables you have are independent and does not have association or connected to each other. Independence of events means it will not affect or influence the occurrence of another event.

    In your example, all probability values are the same so in terms of probability the event of getting event A does not depend on another event B.

      P(A|B) = P(A)  or P(B|A) = P(B)
    

    which reads the probability of event A given an event B is the same with probability of A since A and B are independent. Thus, P(A), P(B), P(A|B) and P(B|A) are the same since A and B are independent based on chisq statistic.