pythonpandascontingency

Python/ Pandas: Making a contingency table with multiple variables


My dataframe has 4 columns (one dependent variable and 3 independent).

Here's a sample:

Sample data

My desired output is a contingency table, as follows:

Desired output

I can only seem to get a contingency table using one independent variable- using the following code (my df is called 'table')

pd.crosstab(index=table['Dvar'],columns=table['Var1'])

I can't seem to be able to add any other variables to this...Is the only way to achieve this to do make a separate contingency table for each var (1 to 3) and then merge/ join them?


Solution

  • First of all, contingency table is for showing correlation between features.

    If you want to probably see correlation between independent and dependent features, go through this code:

    pd.crosstab([table['Var1'],table['Var2'],table['Var3']],
                table['Dvar'], margins = False)
    

    But, as you mention, to get your desired output for that use pandas.DataFrame.groupby statement as:

    table.groupby('Dvar').sum()