I'm trying to perform a join_asof
in Polars where it only looks backward in time, ignoring matches on the same row. Specifically, I want it to find the last row where the Team
and Field
match exactly, but only if the Wk
value is less than the current row's Wk
.
The goal is to get the last GeneralLambda
for recent games, either "Away" or "Home," excluding the current game. I thought the best way to do this would be with a join_asof
, but I'm struggling with how to make it only consider past rows and match on both Team
and Field
.
Here is an example DataFrame to illustrate:
Season | Wk | Team | Field | GeneralLambda |
---|---|---|---|---|
2024 | 25 | TeamA | Away | 2.123456 |
2024 | 25 | TeamB | Home | 1.234567 |
2024 | 25 | TeamC | Away | 0.987654 |
2024 | 25 | TeamD | Home | 1.345678 |
2024 | 25 | TeamE | Away | 1.456789 |
2024 | 26 | TeamA | Home | 1.234567 |
2024 | 26 | TeamB | Away | 1.345678 |
2024 | 26 | TeamC | Home | 0.876543 |
2024 | 26 | TeamD | Away | 1.456789 |
2024 | 26 | TeamE | Home | 1.567890 |
2024 | 27 | TeamA | Away | 2.278759 |
2024 | 27 | TeamB | Away | 1.103829 |
2024 | 27 | TeamC | Home | 0.992563 |
2024 | 27 | TeamD | Home | 1.089324 |
2024 | 27 | TeamE | Home | 1.074221 |
For each row, I need to join it with the last matching row where:
Team
is the same,Field
is the same,Wk
is less than the current row’s Wk.I hope the DataFrame looks like this:
Season | Wk | Team | Field | GeneralLambda | Prev_GeneralLambda |
---|---|---|---|---|---|
2024 | 25 | TeamA | Away | 2.123456 | ALastAwayValue |
2024 | 25 | TeamB | Home | 1.234567 | BLastHomeValue |
2024 | 25 | TeamC | Away | 0.987654 | CLastAwayValue |
2024 | 25 | TeamD | Home | 1.345678 | DLastHomeValue |
2024 | 25 | TeamE | Away | 1.456789 | ELastAwayValue |
2024 | 26 | TeamA | Home | 1.234567 | ALastHomeValue |
2024 | 26 | TeamB | Away | 1.345678 | BLastAwayValue |
2024 | 26 | TeamC | Home | 0.876543 | CLastHomeValue |
2024 | 26 | TeamD | Away | 1.456789 | DLastAwayValue |
2024 | 26 | TeamE | Home | 1.567890 | ELastHomeValue |
2024 | 27 | TeamA | Away | 2.278759 | 2.123456 |
2024 | 27 | TeamB | Away | 1.103829 | 1.345678 |
2024 | 27 | TeamC | Home | 0.992563 | 0.876543 |
2024 | 27 | TeamD | Home | 1.089324 | 1.345678 |
2024 | 27 | TeamE | Home | 1.074221 | 1.567890 |
Is there a way to do this in Polars?
(
df.sort("Wk")
.with_columns(
pl.col("GeneralLambda")
.shift(1)
.fill_null(pl.col("Team").str.tail(1) + "Last" + pl.col("Field") + "Value")
.over("Team","Field")
.alias("Prev_GeneralLambda")
)
)
shape: (15, 6)
┌────────┬─────┬───────┬───────┬───────────────┬────────────────────┐
│ Season ┆ Wk ┆ Team ┆ Field ┆ GeneralLambda ┆ Prev_GeneralLambda │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str ┆ str ┆ f64 ┆ str │
╞════════╪═════╪═══════╪═══════╪═══════════════╪════════════════════╡
│ 2024 ┆ 25 ┆ TeamA ┆ Away ┆ 2.123456 ┆ ALastAwayValue │
│ 2024 ┆ 25 ┆ TeamB ┆ Home ┆ 1.234567 ┆ BLastHomeValue │
│ 2024 ┆ 25 ┆ TeamC ┆ Away ┆ 0.987654 ┆ CLastAwayValue │
│ 2024 ┆ 25 ┆ TeamD ┆ Home ┆ 1.345678 ┆ DLastHomeValue │
│ 2024 ┆ 25 ┆ TeamE ┆ Away ┆ 1.456789 ┆ ELastAwayValue │
│ … ┆ … ┆ … ┆ … ┆ … ┆ … │
│ 2024 ┆ 27 ┆ TeamA ┆ Away ┆ 2.278759 ┆ 2.123456 │
│ 2024 ┆ 27 ┆ TeamB ┆ Away ┆ 1.103829 ┆ 1.345678 │
│ 2024 ┆ 27 ┆ TeamC ┆ Home ┆ 0.992563 ┆ 0.876543 │
│ 2024 ┆ 27 ┆ TeamD ┆ Home ┆ 1.089324 ┆ 1.345678 │
│ 2024 ┆ 27 ┆ TeamE ┆ Home ┆ 1.074221 ┆ 1.56789 │
└────────┴─────┴───────┴───────┴───────────────┴────────────────────┘