I have a dataframe with a column of the dateutil.relativedelta
type. When I try grouping by this column, I get the error TypeError: '<' not supported between instances of 'relativedelta' and 'relativedelta'
. I didn't find anythind in the pandas page that indicated that there restrictions on the types of columns that can be used to group the dataframe. Am I missing something?
Minimal example below:
from dateutil.relativedelta import relativedelta
import pandas as pd
import itertools
def expand_grid(data_dict):
rows = itertools.product(*data_dict.values())
return pd.DataFrame.from_records(rows, columns=data_dict.keys())
ref_dates = pd.date_range(start="2024-06", end="2025-02", freq="MS").tolist()
windows = [relativedelta(months=-30),relativedelta(months=-18)]
max_horizon = [relativedelta(months=2)]
params = expand_grid({'ref_date': ref_dates, 'max_horizon': max_horizon, 'window': windows})
for name, group in params.groupby(by=['window']): print(name)
It not problem with groupby()
but with object relativedelta()
.
It seems it doesn't have functions like __lt__()
(lower than), __gt__()
(grater than) and similar to compare two values.
Try
relativedelta(months=1) < relateivedelta(months=2)
and you get the same error without groupby
But this works
relativedelta(months=1).months < relativedelta(months=2).months
So you may have to create new column with integer values .months
and use this column for comparition.
I never used it before but I tested it with your code: groupby()
can use function as by=
and it sends index to this function
groupbyby(by=lambda index:params.loc[index]['window'].months, ...)
But there is other problem: relativedelta()
keeps months=-18
as years=-2,months=-6
so it would need to get both values as tuple
groupbyby(by=lambda index:(params.loc[index]['window'].months.years, params.loc[index]['window'].months), ...)
or convert them to one value
groupbyby(by=lambda index:(params.loc[index]['window'].months.years *12 + params.loc[index]['window'].months), ...)
from dateutil.relativedelta import relativedelta
import pandas as pd
import itertools
def expand_grid(data_dict):
rows = itertools.product(*data_dict.values())
return pd.DataFrame.from_records(rows, columns=data_dict.keys())
ref_dates = pd.date_range(start="2024-06", end="2025-02", freq="MS").tolist()
windows = [relativedelta(months=-30),relativedelta(months=-18)]
max_horizon = [relativedelta(months=2)]
params = expand_grid({'ref_date': ref_dates, 'max_horizon': max_horizon, 'window': windows})
for name, group in params.groupby(by=lambda index:(params.loc[index]['window'].years*12+params.loc[index]['window'].months) ):
print(name)
print(group)
I found similar question for relativedelta()
(12 years old) on Stackoverflow:
python - Comparing dateutil.relativedelta - Stack Overflow
And issue in dateutil
repo: (year 2017)
relativedelta does not implement __lt__ · Issue #350 · dateutil/dateutil