
Preserving DataFrame subclass type during pandas groupby().aggregate()

I'm subclassing pandas DataFrame in a project of mine. Most pandas operations preserve the subclass type, but df.groupby().agg() does not. Is this a bug? Is there a known workaround?

import pandas as pd

class MySeries(pd.Series):

class MyDataFrame(pd.DataFrame):
    def _constructor(self):
        return MyDataFrame
    _constructor_sliced = MySeries

MySeries._constructor_expanddim = MyDataFrame

df = MyDataFrame({"a": reversed(range(10)), "b": list('aaaabbbccc')})

# <class '__main__.MyDataFrame'>

print(type(df.groupby("b").agg({"a": "sum"})))
# <class 'pandas.core.frame.DataFrame'>

It looks like there was an issue (described here) that fixed subclassing for df.groupby, but as far as I can tell df.groupby().agg() was missed. I'm using pandas version 2.0.3.


  • It turns out that groupby().agg() combines Series to build a DataFrame, so the subclassed Series constructor needs to be properly defined. See this documentation.

    The following code runs with no errors:

    import pandas as pd
    class MySeries(pd.Series):
        def _constructor(self):
            return MySeries
        def _constructor_expanddim(self):
            return MyDataFrame
    class MyDataFrame(pd.DataFrame):
        def _constructor(self):
            return MyDataFrame
        def _constructor_sliced(self):
            return MySeries
    df = MyDataFrame({"a": reversed(range(10)), "b": list('aaaabbbccc')})
    assert isinstance(df.groupby("b").agg({"a": "sum"}), MyDataFrame)