pythondataframepython-polars

How to count work days between date columns with Polars


I have the following DataFrame.

df = pl.from_repr("""
┌────────────┬───────────────┐
│ date       ┆ maturity_date │
│ ---        ┆ ---           │
│ date       ┆ date          │
╞════════════╪═══════════════╡
│ 2000-01-04 ┆ 2000-01-17    │
│ 2000-01-04 ┆ 2000-02-15    │
│ 2000-01-04 ┆ 2000-03-15    │
│ 2000-01-04 ┆ 2000-04-17    │
│ 2000-01-04 ┆ 2000-05-15    │
└────────────┴───────────────┘
""")

I'm trying to get the number of the workdays between date and maturity_date (not counting saturday and sunday)

I'd also like to calculate diff days that use a given calendar like a trade date calendar of stock market which is different from a normal calendar.

I use this date_ranges to count workdays, but it seems only a little faster than map_elements

df.with_columns(
    pl.date_ranges("date", "maturity_date")
    .list.eval(pl.element().dt.weekday() <= 5)
    .list.count_matches(True)
    .alias("workdays_diff")
    # pl.concat_list("date", "maturity_date").map_elements(lambda x: get_work_days(x[0], x[1]))
    # .alias("workdays_diff")
)
shape: (5, 3)
┌────────────┬───────────────┬───────────────┐ # ┌────────────────┐
│ date       ┆ maturity_date ┆ workdays_diff │ # │ tradedate_diff │
│ ---        ┆ ---           ┆ ---           │ # │ ---            │
│ date       ┆ date          ┆ u32           │ # │ i64            │
╞════════════╪═══════════════╪═══════════════╡ # ╞════════════════╡
│ 2000-01-04 ┆ 2000-01-17    ┆ 10            │ # │ 10             │
│ 2000-01-04 ┆ 2000-02-15    ┆ 31            │ # │ 21             │
│ 2000-01-04 ┆ 2000-03-15    ┆ 52            │ # │ 42             │
│ 2000-01-04 ┆ 2000-04-17    ┆ 75            │ # │ 65             │
│ 2000-01-04 ┆ 2000-05-15    ┆ 95            │ # │ 80             │
└────────────┴───────────────┴───────────────┘ # └────────────────┘

Is there a faster way?

Is there also a way to calculate tradedate_diff?


Solution

  • @Dean MacGregor's excellent answer using pure Polars wasn't quite performant enough for my use case. Using Numpy's built in busday_count function turned out to be much faster in my case and can easily be converted back to a polars with pl.from_numpy.

    nyse_holidays=pl.Series([
        date(2000,1,17),
        date(2000,2,21),
        date(2000,4,21),
        date(2000,5,29),
        date(2000,7,4),
        date(2000,9,4),
        date(2000,11,23),
        date(2000,12,25),
    ])
    df = pl.DataFrame(
        {
            "business_days": pl.from_numpy(np.busday_count(data["date"], data["maturity_date"]),["business_days"])["business_days"],
            "trade_days": pl.from_numpy(np.busday_count(data["date"], data["maturity_date"], holidays=nyse_holidays),["trade_days"])["trade_days"],
        }
    )
    shape: (5, 2)
    ┌───────────────┬────────────┐
    │ business_days ┆ trade_days │
    │ ---           ┆ ---        │
    │ i64           ┆ i64        │
    ╞═══════════════╪════════════╡
    │ 9             ┆ 9          │
    │ 30            ┆ 29         │
    │ 51            ┆ 49         │
    │ 74            ┆ 72         │
    │ 94            ┆ 91         │
    └───────────────┴────────────┘