I wrote the test according to an approach I found. When looking in Stack Overflow I saw another approach (can be seen here) which was a little more complicated, and made me wonder if I chose the right one.
I'm looking for ways to check if my calculation is correct.
Here is the relevant code:
from scipy.stats import chi2_contingency
import pandas as p
...
# Example data
data[['Eczema', 'Gender']]
Eczema Gender
1 Healthy 0
4 Healthy 1
5 Healthy 0
6 Healthy 1
8 Healthy 1
.. ... ...
601 Healthy 0
603 Healthy 0
604 Healthy 1
606 Diseased 1
607 Healthy 1
# The contingency table:
p.crosstab(data['Eczema'], data['Gender'])
Gender 0 1
Eczema
Diseased 5 11
Healthy 219 233
# The calculation:
chi2, p, dof, ex = chi2_contingency(p.crosstab(data['Eczema'], data['Gender']))
p
0.27176974714995455
Any suggestions will be welcomed. Thanks!
The other approach that you linked to is not actually a different method. The code in that question attempted to do the same calculations as those in chi2_contingency
, but it had some mistakes.
Your code looks fine. With a p-value of 0.27, one would say that the data does not support rejecting the null hypothesis of no association between Eczema and Gender.