pythonpython-polars

Is there a way to group_by in polars while keeping other columns?


I am currently trying to achieve a polars group_by while keeping other columns than the ones in the group_by function.

Here is an example of an input data frame that I have.

df = pl.from_repr("""
┌─────┬─────┬─────┬─────┐
│ SRC ┆ TGT ┆ IT  ┆ Cd  │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 ┆ f64 │
╞═════╪═════╪═════╪═════╡
│ 1   ┆ 1   ┆ 2   ┆ 3.0 │
│ 2   ┆ 1   ┆ 2   ┆ 4.0 │
│ 3   ┆ 1   ┆ 2   ┆ 3.0 │
│ 3   ┆ 2   ┆ 1   ┆ 8.0 │
└─────┴─────┴─────┴─────┘
""")

I want to group by ['TGT', 'IT'] using min('Cd'), which is the following code :

df.group_by('TGT', 'IT').agg(pl.col('Cd').min())

With this code line, I obtain the following dataframe.

┌─────┬─────┬─────┐
│ TGT ┆ IT  ┆ Cd  │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ f64 │
╞═════╪═════╪═════╡
│ 1   ┆ 2   ┆ 3.0 │
│ 2   ┆ 1   ┆ 8.0 │
└─────┴─────┴─────┘

And here is the dataframe I would rather want

┌─────┬─────┬─────┬─────┐
│ SRC ┆ TGT ┆ IT  ┆ Cd  │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 ┆ f64 │
╞═════╪═════╪═════╪═════╡
│ 1   ┆ 1   ┆ 2   ┆ 3.0 │
│ 3   ┆ 2   ┆ 1   ┆ 8.0 │
└─────┴─────┴─────┴─────┘

I thing I could achieve this by joining the first dataframe on the grouped one using ['TGT', 'IT', 'Cd'], and then delete the doubled rows, as I only want one (and any) 'SRC' for each ('TGT', 'IT') couple. But I wanted to know if there is a more straightforward way to do it, especially by keeping the 'SRC' column during the group_by

Thanks by advance


Solution

  • # Your data
    data = {
        "SRC": [1, 2, 3, 3],
        "TGT": [1, 1, 1, 2],
        "IT": [2, 2, 2, 1],
        "Cd": [3.0, 4.0, 3.0, 8.0]
    }
    
    df = pl.DataFrame(data)
    
    # Perform the group_by and aggregation
    result = (
        df.group_by('TGT', 'IT', maintain_order=True)
        .agg(
            pl.col('SRC').first(),
            pl.col('Cd').min()
        )
        .select('SRC', 'TGT', 'IT', 'Cd')  # to reorder columns
    )
    
    print(result)
    

    enter image description here