python-3.xpandasapplykruskal-wallis

Pandas apply kruskal-wallis to numeric columns


I have a dataframe of 27 columns (26 are numeric variables and the 27th column tells me which group each row is associated with). There are 7 groups in total I'm trying to apply the Kruskal-Wallis test to each variable, split by group, to determine if there is a significant difference or not.

I have tried:

df.groupby(['treatment']).apply(kruskal)

which throws an error "Need at least 2 groups two groups in stats.kruskal()".

My other attempts haven't produced an output either. I'll be doing similar analyses on a regular basis and with larger datasets. Can someone help me understand this issue and how to fix it?


Solution

  • With Scipy, you could do like that for each variable:

    scipy.stats.kruskal(*[group["variable"].values for name, group in df.groupby("treatment")])