I have the following DataFrame.
df = pl.from_repr("""
┌────────────┬───────────────┐
│ date ┆ maturity_date │
│ --- ┆ --- │
│ date ┆ date │
╞════════════╪═══════════════╡
│ 2000-01-04 ┆ 2000-01-17 │
│ 2000-01-04 ┆ 2000-02-15 │
│ 2000-01-04 ┆ 2000-03-15 │
│ 2000-01-04 ┆ 2000-04-17 │
│ 2000-01-04 ┆ 2000-05-15 │
└────────────┴───────────────┘
""")
I'm trying to get the number of the workdays between date
and maturity_date
(not counting saturday and sunday)
I'd also like to calculate diff days that use a given calendar like a trade date calendar of stock market which is different from a normal calendar.
I use this date_ranges
to count workdays, but it seems only a little faster than map_elements
df.with_columns(
pl.date_ranges("date", "maturity_date")
.list.eval(pl.element().dt.weekday() <= 5)
.list.count_matches(True)
.alias("workdays_diff")
# pl.concat_list("date", "maturity_date").map_elements(lambda x: get_work_days(x[0], x[1]))
# .alias("workdays_diff")
)
shape: (5, 3)
┌────────────┬───────────────┬───────────────┐ # ┌────────────────┐
│ date ┆ maturity_date ┆ workdays_diff │ # │ tradedate_diff │
│ --- ┆ --- ┆ --- │ # │ --- │
│ date ┆ date ┆ u32 │ # │ i64 │
╞════════════╪═══════════════╪═══════════════╡ # ╞════════════════╡
│ 2000-01-04 ┆ 2000-01-17 ┆ 10 │ # │ 10 │
│ 2000-01-04 ┆ 2000-02-15 ┆ 31 │ # │ 21 │
│ 2000-01-04 ┆ 2000-03-15 ┆ 52 │ # │ 42 │
│ 2000-01-04 ┆ 2000-04-17 ┆ 75 │ # │ 65 │
│ 2000-01-04 ┆ 2000-05-15 ┆ 95 │ # │ 80 │
└────────────┴───────────────┴───────────────┘ # └────────────────┘
Is there a faster way?
Is there also a way to calculate tradedate_diff
?
@Dean MacGregor's excellent answer using pure Polars wasn't quite performant enough for my use case.
Using Numpy's built in busday_count
function turned out to be much faster in my case and can easily be converted back to a polars with pl.from_numpy
.
nyse_holidays=pl.Series([
date(2000,1,17),
date(2000,2,21),
date(2000,4,21),
date(2000,5,29),
date(2000,7,4),
date(2000,9,4),
date(2000,11,23),
date(2000,12,25),
])
df = pl.DataFrame(
{
"business_days": pl.from_numpy(np.busday_count(data["date"], data["maturity_date"]),["business_days"])["business_days"],
"trade_days": pl.from_numpy(np.busday_count(data["date"], data["maturity_date"], holidays=nyse_holidays),["trade_days"])["trade_days"],
}
)
shape: (5, 2)
┌───────────────┬────────────┐
│ business_days ┆ trade_days │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═══════════════╪════════════╡
│ 9 ┆ 9 │
│ 30 ┆ 29 │
│ 51 ┆ 49 │
│ 74 ┆ 72 │
│ 94 ┆ 91 │
└───────────────┴────────────┘