Suppose I create a dataframe:
In [1]: df = pd.DataFrame({'a':[1,1,1,2,2,2], 'b':[1,2,3,4,5,6]})
If I do most statistics on a grouped version of that dataframe, they come out as expected:
In [2]: df.groupby('a').median()
Out[2]:
b
a
1 2
2 5
But when I calculate the median absolute deviation (mad), I get an extra column 'a', which is all zeros:
In [3]: df.groupby('a').mad()
Out[3]:
a b
a
1 0 0.666667
2 0 0.666667
The mad() function seems to work fine on a normal dataframe, just not on a grouped on. Unless this is a feature, not a bug, and I just don't understand it. Thoughts?
This is a bug, slated to be fixed for 0.14 (releasing soon), see here. The bug is that non-cythonized routines are calling apply
rather than ``agg` effectively.
work-around is to do:
df.groupby('a').agg(lambda x: x.mad())