pythonpandas

Conditional groupby in a for loop


I am using this answer, for a similar question. To make my question independent, I repeat the code here:

scipy.stats.kruskal(*[group["variable"].values for name, group in df.groupby("treatment")])

So now each class is grouped and a function (in this case kruskal test) is applied on the groups of different classes. Now my question is how to exclude classes with low sample seize! E.g., ignore classes with less than 5 samples?

Thank you in advance.


Solution

  • Ok it was easier than what I thought! Inspired by this example I changed the code to:

    scipy.stats.kruskal(*[group["variable"].values for name, group in df.groupby("treatment") if group["variable"].values.size > 4])