pythonpandaspython-dateutil

Groupby column with relativedelta value


I have a dataframe with a column of the dateutil.relativedelta type. When I try grouping by this column, I get the error TypeError: '<' not supported between instances of 'relativedelta' and 'relativedelta'. I didn't find anythind in the pandas page that indicated that there restrictions on the types of columns that can be used to group the dataframe. Am I missing something?

Minimal example below:

from dateutil.relativedelta import relativedelta
import pandas as pd
import itertools

def expand_grid(data_dict):
    rows = itertools.product(*data_dict.values())
    return pd.DataFrame.from_records(rows, columns=data_dict.keys())

ref_dates = pd.date_range(start="2024-06", end="2025-02", freq="MS").tolist()
windows = [relativedelta(months=-30),relativedelta(months=-18)]
max_horizon = [relativedelta(months=2)]  
params = expand_grid({'ref_date': ref_dates, 'max_horizon': max_horizon, 'window': windows})

for name, group in params.groupby(by=['window']): print(name)

Solution

  • It not problem with groupby() but with object relativedelta().

    It seems it doesn't have functions like __lt__() (lower than), __gt__() (grater than) and similar to compare two values.

    Try

    relativedelta(months=1) < relateivedelta(months=2)
    

    and you get the same error without groupby

    But this works

    relativedelta(months=1).months < relativedelta(months=2).months
    

    So you may have to create new column with integer values .months and use this column for comparition.


    I never used it before but I tested it with your code: groupby() can use function as by= and it sends index to this function

    groupbyby(by=lambda index:params.loc[index]['window'].months, ...)
    

    But there is other problem: relativedelta() keeps months=-18 as years=-2,months=-6 so it would need to get both values as tuple

    groupbyby(by=lambda index:(params.loc[index]['window'].months.years, params.loc[index]['window'].months), ...)
    

    or convert them to one value

    groupbyby(by=lambda index:(params.loc[index]['window'].months.years *12 + params.loc[index]['window'].months), ...)
    

    from dateutil.relativedelta import relativedelta
    import pandas as pd
    import itertools
    
    def expand_grid(data_dict):
        rows = itertools.product(*data_dict.values())
        return pd.DataFrame.from_records(rows, columns=data_dict.keys())
    
    ref_dates = pd.date_range(start="2024-06", end="2025-02", freq="MS").tolist()
    windows = [relativedelta(months=-30),relativedelta(months=-18)]
    max_horizon = [relativedelta(months=2)]  
    params = expand_grid({'ref_date': ref_dates, 'max_horizon': max_horizon, 'window': windows})
    
    for name, group in params.groupby(by=lambda index:(params.loc[index]['window'].years*12+params.loc[index]['window'].months) ): 
        print(name)
        print(group)
    

    I found similar question for relativedelta() (12 years old) on Stackoverflow:

    python - Comparing dateutil.relativedelta - Stack Overflow

    And issue in dateutil repo: (year 2017)

    relativedelta does not implement __lt__ · Issue #350 · dateutil/dateutil