[SOLVED] Chi-Squared Test

Chi-Squared Test

I have two sets of categorical features and need to apply a Chi-squared test. I couldn't utilize and understand the chi-square tests available in modules. Can you help me with a function to have p-values and test the null hypothesis?

Solution

Here, I present a function that calculates a Chi-squared test from two sets of pandas DataFrame.

from scipy import stats
def my_chi2(column, target):
    """
   This method computes p-Value of chi^2 test between column and target
    Inpute:
        column: Data Type Series
        target: Data Type Series
    Output:
        chi_square: float
            Calculated by the formulla
        p_value: float
            CDF of the calculated chi^2 test
    """
    # create contingency table
    data_crosstab = pd.crosstab(column,target, margins=True, margins_name="Total")
    # Calcualtion of Chisquare test statistics
    chi_square = 0
    rows = column.unique()
    columns = target.unique()
    for i in columns:
        for j in rows:
            O = data_crosstab[i][j]
            E = data_crosstab[i]['Total'] * data_crosstab['Total'][j] / data_crosstab['Total']['Total']
            chi_square += (O-E)**2/E
    # The p-value approach
    p_value = 1 - stats.norm.cdf(chi_square, (len(rows)-1)*(len(columns)-1))
    return chi_square, p_value