pythonpandaspython-datetime

Possible to vectorize date arithmetic with different offset per row?


I have a simple function get_start_of_period which takes in a date and returns the start of that period. It looks as follows:

import datetime as dt
from dateutil.relativedelta import relativedelta
from typing import Literal

def get_start_of_period(date: dt.date, term_length: Literal[1, 3, 12]) -> dt.date:
    months_in_period = (date.month - 1) % term_length
    return date - relativedelta(day=1, months=months_in_period)

This function takes any input date and converts it to the beginning of the month, quarter, or year depending on the term_length param:

>>> curr_date = dt.date(2024, 6, 7)
>>> get_start_of_period(curr_date, term_length=1)
datetime.date(2024, 6, 1)
>>> get_start_of_period(curr_date, term_length=3)
datetime.date(2024, 4, 1)
>>> get_start_of_period(curr_date, term_length=12)
datetime.date(2024, 1, 1)

I have a Pandas Series of dates which I would like to convert to the start of their respective period in this manner. You can assume term_length will be constant for the whole Series. Is there a way to vectorize that conversion? Currently I just have this written as a series.apply(lambda curr_date: get_start_of_period(curr_date, term_length=1) but I'm wondering if I can improve that.

The pd.DateOffset appears to only accept a constant, and pd.Timedelta doesn't take month-level granularity. numpy supports a timedelta64[M] object, but the behavior looks odd -- adding a delta of 1 month adds 30 days, while adding n > 1 months adds two months minus a day.

Most solutions I see online appear to use apply, just wondering if we can do one better.

Thank you!


Solution

  • Turns out Pandas has MonthBegin, QuarterBegin, and YearBegin, which do what I need. Think this does the trick:

    def get_start_of_period_pd(series: pd.Series, term_length: int) -> pd.Series:
        if term_length == 1:
            end_shift = pd.offsets.MonthEnd(n=0)
            start_shift = pd.offsets.MonthBegin(n=1)
        elif term_length == 3:
            end_shift = pd.offsets.QuarterEnd(n=0, startingMonth=3)
            start_shift = pd.offsets.QuarterBegin(startingMonth=1)
        else:
            end_shift = pd.offsets.YearEnd(n=0, month=12)
            start_shift = pd.offsets.YearBegin(month=1)
    
        return series - end_shift - start_shift