I have a simple function get_start_of_period
which takes in a date and returns the start of that period. It looks as follows:
import datetime as dt
from dateutil.relativedelta import relativedelta
from typing import Literal
def get_start_of_period(date: dt.date, term_length: Literal[1, 3, 12]) -> dt.date:
months_in_period = (date.month - 1) % term_length
return date - relativedelta(day=1, months=months_in_period)
This function takes any input date and converts it to the beginning of the month, quarter, or year depending on the term_length
param:
>>> curr_date = dt.date(2024, 6, 7)
>>> get_start_of_period(curr_date, term_length=1)
datetime.date(2024, 6, 1)
>>> get_start_of_period(curr_date, term_length=3)
datetime.date(2024, 4, 1)
>>> get_start_of_period(curr_date, term_length=12)
datetime.date(2024, 1, 1)
I have a Pandas Series of dates which I would like to convert to the start of their respective period in this manner. You can assume term_length
will be constant for the whole Series. Is there a way to vectorize that conversion? Currently I just have this written as a series.apply(lambda curr_date: get_start_of_period(curr_date, term_length=1)
but I'm wondering if I can improve that.
The pd.DateOffset
appears to only accept a constant, and pd.Timedelta
doesn't take month-level granularity. numpy supports a timedelta64[M]
object, but the behavior looks odd -- adding a delta of 1 month adds 30 days, while adding n > 1
months adds two months minus a day.
Most solutions I see online appear to use apply
, just wondering if we can do one better.
Thank you!
Turns out Pandas has MonthBegin
, QuarterBegin
, and YearBegin
, which do what I need. Think this does the trick:
def get_start_of_period_pd(series: pd.Series, term_length: int) -> pd.Series:
if term_length == 1:
end_shift = pd.offsets.MonthEnd(n=0)
start_shift = pd.offsets.MonthBegin(n=1)
elif term_length == 3:
end_shift = pd.offsets.QuarterEnd(n=0, startingMonth=3)
start_shift = pd.offsets.QuarterBegin(startingMonth=1)
else:
end_shift = pd.offsets.YearEnd(n=0, month=12)
start_shift = pd.offsets.YearBegin(month=1)
return series - end_shift - start_shift