pythonpandas

Unexpected behavior in pandas mad() with groupby()


Suppose I create a dataframe:

In [1]: df = pd.DataFrame({'a':[1,1,1,2,2,2], 'b':[1,2,3,4,5,6]})

If I do most statistics on a grouped version of that dataframe, they come out as expected:

In [2]: df.groupby('a').median()
Out[2]: 
   b
a   
1  2
2  5    

But when I calculate the median absolute deviation (mad), I get an extra column 'a', which is all zeros:

In [3]: df.groupby('a').mad()
Out[3]: 
   a         b
a             
1  0  0.666667
2  0  0.666667

The mad() function seems to work fine on a normal dataframe, just not on a grouped on. Unless this is a feature, not a bug, and I just don't understand it. Thoughts?


Solution

  • This is a bug, slated to be fixed for 0.14 (releasing soon), see here. The bug is that non-cythonized routines are calling apply rather than ``agg` effectively.

    work-around is to do:

    df.groupby('a').agg(lambda x: x.mad())