pythondataframedatetimegroupingpython-polars

Python - Subtract Date by First Entry?


I'm using polars library in python to manipulate some dataframe.

I'm trying to do the following: :

For some dataframe:

Person.. Fight with On
A B 3 Jan
A C 4 Jan
A D 5 Jan
A E 5 Jan
A B 10 Jan
A B 20 Jan
A C 20 Jan

I want to return the "distance" between the current fighter-pair and the first fight they had, such that: :

Person.. Fight with On Distance
A B 3 Jan 0 Days
A C 4 Jan 0 Days
A D 5 Jan 0 Days
A E 5 Jan 0 Days
A B 10 Jan 7 Days (i.e. 10 Jan - 3 Jan); (CurrentDate - ABFirstFight)
A B 20 Jan 17 Days (i.e. 20 Jan - 3 Jan); (CurrentDate - ABFirstFight)
A C 20 Jan 16 Days (i.e. 20 Jan - 4 Jan); (CurrentDate - ACFirstFight)

<What I've Tried>:

  1. polars "first" function: Only returned the head of the dataframe
  2. polars "first" function with some combinations of "over"/"group_by"/"rolling" functions: Returned some numbers, but I can't make sense of why the output was that way

Does anyone have any advice on how to attempt this?

I think I might need to use some combination of "group_by" or "over", "first", and perhaps "sub" (to subtract two dates?), but I'm not sure how to proceed. The hardest part for me is to try to extract the first entry of a given group (e.g. first date entry of the A-B pair, or the A-C pair, etc.)


Solution

  • Here's one way to do this:

    Group the dataframe by [person, fight_with], select the min on, and then join it with the original dataframe.

    Here's some code:

    import polars as pl
    
    df = pl.DataFrame(
        {
            "person": ["A", "A", "A", "A", "A", "A", "A"],
            "fight_with": ["B", "C", "D", "E", "B", "B", "C"],
            "on": [3, 4, 5, 5, 10, 20, 20],
        }
    )
    
    first_fight = df.group_by("person", "fight_with").agg(pl.min("on").alias("min_on"))
    
    df = df.join(first_fight, on=["person", "fight_with"]).with_columns(
        distance=pl.col("on") - pl.col("min_on")
    )
    
    print(df)
    

    Output:

    shape: (7, 5)
    ┌────────┬────────────┬─────┬────────┬──────────┐
    │ person ┆ fight_with ┆ on  ┆ min_on ┆ distance │
    │ ---    ┆ ---        ┆ --- ┆ ---    ┆ ---      │
    │ str    ┆ str        ┆ i64 ┆ i64    ┆ i64      │
    ╞════════╪════════════╪═════╪════════╪══════════╡
    │ A      ┆ B          ┆ 3   ┆ 3      ┆ 0        │
    │ A      ┆ C          ┆ 4   ┆ 4      ┆ 0        │
    │ A      ┆ D          ┆ 5   ┆ 5      ┆ 0        │
    │ A      ┆ E          ┆ 5   ┆ 5      ┆ 0        │
    │ A      ┆ B          ┆ 10  ┆ 3      ┆ 7        │
    │ A      ┆ B          ┆ 20  ┆ 3      ┆ 17       │
    │ A      ┆ C          ┆ 20  ┆ 4      ┆ 16       │
    └────────┴────────────┴─────┴────────┴──────────┘