pythondataframejoinpython-polars

How to perform a join_asof in Polars looking only backward and matching on specific conditions?


I'm trying to perform a join_asof in Polars where it only looks backward in time, ignoring matches on the same row. Specifically, I want it to find the last row where the Team and Field match exactly, but only if the Wk value is less than the current row's Wk.

The goal is to get the last GeneralLambda for recent games, either "Away" or "Home," excluding the current game. I thought the best way to do this would be with a join_asof, but I'm struggling with how to make it only consider past rows and match on both Team and Field.

Here is an example DataFrame to illustrate:

Season Wk Team Field GeneralLambda
2024 25 TeamA Away 2.123456
2024 25 TeamB Home 1.234567
2024 25 TeamC Away 0.987654
2024 25 TeamD Home 1.345678
2024 25 TeamE Away 1.456789
2024 26 TeamA Home 1.234567
2024 26 TeamB Away 1.345678
2024 26 TeamC Home 0.876543
2024 26 TeamD Away 1.456789
2024 26 TeamE Home 1.567890
2024 27 TeamA Away 2.278759
2024 27 TeamB Away 1.103829
2024 27 TeamC Home 0.992563
2024 27 TeamD Home 1.089324
2024 27 TeamE Home 1.074221

For each row, I need to join it with the last matching row where:

I hope the DataFrame looks like this:

Season Wk Team Field GeneralLambda Prev_GeneralLambda
2024 25 TeamA Away 2.123456 ALastAwayValue
2024 25 TeamB Home 1.234567 BLastHomeValue
2024 25 TeamC Away 0.987654 CLastAwayValue
2024 25 TeamD Home 1.345678 DLastHomeValue
2024 25 TeamE Away 1.456789 ELastAwayValue
2024 26 TeamA Home 1.234567 ALastHomeValue
2024 26 TeamB Away 1.345678 BLastAwayValue
2024 26 TeamC Home 0.876543 CLastHomeValue
2024 26 TeamD Away 1.456789 DLastAwayValue
2024 26 TeamE Home 1.567890 ELastHomeValue
2024 27 TeamA Away 2.278759 2.123456
2024 27 TeamB Away 1.103829 1.345678
2024 27 TeamC Home 0.992563 0.876543
2024 27 TeamD Home 1.089324 1.345678
2024 27 TeamE Home 1.074221 1.567890

Is there a way to do this in Polars?


Solution

  • (
        df.sort("Wk")
        .with_columns(
            pl.col("GeneralLambda")
            .shift(1)
            .fill_null(pl.col("Team").str.tail(1) + "Last" + pl.col("Field") + "Value")
            .over("Team","Field")
            .alias("Prev_GeneralLambda")
        )
    )
    
    shape: (15, 6)
    ┌────────┬─────┬───────┬───────┬───────────────┬────────────────────┐
    │ Season ┆ Wk  ┆ Team  ┆ Field ┆ GeneralLambda ┆ Prev_GeneralLambda │
    │ ---    ┆ --- ┆ ---   ┆ ---   ┆ ---           ┆ ---                │
    │ i64    ┆ i64 ┆ str   ┆ str   ┆ f64           ┆ str                │
    ╞════════╪═════╪═══════╪═══════╪═══════════════╪════════════════════╡
    │ 2024   ┆ 25  ┆ TeamA ┆ Away  ┆ 2.123456      ┆ ALastAwayValue     │
    │ 2024   ┆ 25  ┆ TeamB ┆ Home  ┆ 1.234567      ┆ BLastHomeValue     │
    │ 2024   ┆ 25  ┆ TeamC ┆ Away  ┆ 0.987654      ┆ CLastAwayValue     │
    │ 2024   ┆ 25  ┆ TeamD ┆ Home  ┆ 1.345678      ┆ DLastHomeValue     │
    │ 2024   ┆ 25  ┆ TeamE ┆ Away  ┆ 1.456789      ┆ ELastAwayValue     │
    │ …      ┆ …   ┆ …     ┆ …     ┆ …             ┆ …                  │
    │ 2024   ┆ 27  ┆ TeamA ┆ Away  ┆ 2.278759      ┆ 2.123456           │
    │ 2024   ┆ 27  ┆ TeamB ┆ Away  ┆ 1.103829      ┆ 1.345678           │
    │ 2024   ┆ 27  ┆ TeamC ┆ Home  ┆ 0.992563      ┆ 0.876543           │
    │ 2024   ┆ 27  ┆ TeamD ┆ Home  ┆ 1.089324      ┆ 1.345678           │
    │ 2024   ┆ 27  ┆ TeamE ┆ Home  ┆ 1.074221      ┆ 1.56789            │
    └────────┴─────┴───────┴───────┴───────────────┴────────────────────┘