pythonpandaschainedobject-composition

pandas: Composition for chained methods like .resample(), .rolling() etc


I would like to construct an extension of pandas.DataFrame — let's call it SPDF — which could do stuff above and beyond what a simple DataFrame can:

import pandas as pd
import numpy as np


def to_spdf(func):
    """Transform generic output of `func` to SPDF.

    Returns
    -------
    wrapper : callable
    """
    def wrapper(*args, **kwargs):
        res = func(*args, **kwargs)
        return SPDF(res)

    return wrapper


class SPDF:
    """Special-purpose dataframe.

    Parameters
    ----------
    df : pandas.DataFrame

    """

    def __init__(self, df):
        self.df = df

    def __repr__(self):
        return repr(self.df)

    def __getattr__(self, item):
        res = getattr(self.df, item)

        if callable(res):
            res = to_spdf(res)

        return res


if __name__ == "__main__":

    # construct a generic SPDF
    df = pd.DataFrame(np.eye(4))
    an_spdf = SPDF(df)

    # call .diff() to obtain another SPDF
    print(an_spdf.diff())

Right now, methods of DataFrame that return another DataFrame, such as .diff() in the MWE above, return me another SPDF, which is great. However, I would also like to trick chained methods such as .resample('M').last() or .rolling(2).mean() into producing an SPDF in the very end. I have failed so far because .rolling() and the like are of type callable, and my wrapper to_spdf tries to construct an SPDF from their output without 'waiting' for .mean() or any other last part of the expression. Any ideas how to tackle this problem?

Thanks.


Solution

  • You should be properly subclassing dataframe. In order to get copy-constructor methods to work, pandas describes that you must set the _constructor property (along with other information).

    You could do something like the following:

    class SPDF(DataFrame):
    
        @property
        def _constructor(self):
            return SPDF
    

    If you need to preserve custom attributes (not functions - those will be there), during copy-constructor methods (like diff), then you can do something like the following

    class SPDF(DataFrame):
        _metadata = ['prop']
        prop = 1
    
        @property
        def _constructor(self):
            return SPDF
    

    Notice the output is as desired:

    df = SPDF(np.eye(4))
    print(type(df))
    [<class '__main__.SPDF'>]
    new = df.diff()
    print(type(new))
    [<class '__main__.SPDF'>]