pythondataframejoinpython-polars

Filter or join a polars dataframe by columns from another dataframe


I have two pl.DataFrames:

from datetime import date

import polars as pl

df1 = pl.DataFrame(
    {
        "symbol": [
            "sec1", "sec1", "sec1", "sec1", "sec1", "sec1",
            "sec2", "sec2", "sec2", "sec2", "sec2",
        ],
        "date": [
            date(2021, 9, 14),
            date(2021, 9, 15),
            date(2021, 9, 16),
            date(2021, 9, 17),
            date(2021, 8, 31),
            date(2020, 12, 31),
            date(2021, 9, 14),
            date(2021, 9, 15),
            date(2021, 8, 31),
            date(2021, 12, 30),
            date(2020, 12, 31),
        ],
        "price": range(11),
    }
)

df2 = pl.DataFrame(
    {
        "symbol": ["sec1", "sec2"],
        "current_date": [date(2021, 9, 17), date(2021, 9, 15)],
        "mtd": [date(2021, 8, 31), date(2021, 8, 31)],
        "ytd": [date(2020, 12, 31), date(2020, 12, 30)],
    }
)

with pl.Config(tbl_rows=-1):
    print(df1)
    print(df2)

shape: (11, 3)
┌────────┬────────────┬───────┐
│ symbol ┆ date       ┆ price │
│ ---    ┆ ---        ┆ ---   │
│ str    ┆ date       ┆ i64   │
╞════════╪════════════╪═══════╡
│ sec1   ┆ 2021-09-14 ┆ 0     │
│ sec1   ┆ 2021-09-15 ┆ 1     │
│ sec1   ┆ 2021-09-16 ┆ 2     │
│ sec1   ┆ 2021-09-17 ┆ 3     │
│ sec1   ┆ 2021-08-31 ┆ 4     │
│ sec1   ┆ 2020-12-31 ┆ 5     │
│ sec2   ┆ 2021-09-14 ┆ 6     │
│ sec2   ┆ 2021-09-15 ┆ 7     │
│ sec2   ┆ 2021-08-31 ┆ 8     │
│ sec2   ┆ 2021-12-30 ┆ 9     │
│ sec2   ┆ 2020-12-31 ┆ 10    │
└────────┴────────────┴───────┘
shape: (2, 4)
┌────────┬──────────────┬────────────┬────────────┐
│ symbol ┆ current_date ┆ mtd        ┆ ytd        │
│ ---    ┆ ---          ┆ ---        ┆ ---        │
│ str    ┆ date         ┆ date       ┆ date       │
╞════════╪══════════════╪════════════╪════════════╡
│ sec1   ┆ 2021-09-17   ┆ 2021-08-31 ┆ 2020-12-31 │
│ sec2   ┆ 2021-09-15   ┆ 2021-08-31 ┆ 2020-12-30 │
└────────┴──────────────┴────────────┴────────────┘

I need to filter the prices of df1 for each group with the respective dates from df2. I need to incorporate all columns of type date. The number of these columns in df2 might not be fixed.

I am looking for the following result:

shape: (11, 3)
┌────────┬────────────┬───────┐
│ symbol ┆ date       ┆ price │
│ ---    ┆ ---        ┆ ---   │
│ str    ┆ date       ┆ i64   │
╞════════╪════════════╪═══════╡
│ sec1   ┆ 2021-09-17 ┆ 3     │
│ sec1   ┆ 2021-08-31 ┆ 4     │
│ sec1   ┆ 2020-12-31 ┆ 5     │
│ sec2   ┆ 2021-09-15 ┆ 7     │
│ sec2   ┆ 2021-08-31 ┆ 8     │
│ sec2   ┆ 2020-12-30 ┆ 9     │
└────────┴────────────┴───────┘

I was thinking of filtering df1 by symbol and then do a join operation for every individual date column of df2. I would then subsequently concatenate the resulting dataframes. However, there's probably a much more elegant solution.


Solution

  • You can unpivot, then join:

    df1.join(
        df2.unpivot(index='symbol', value_name='date').drop('variable'),
        on=['symbol', 'date'],
        how='inner',
    )
    

    Output:

    ┌────────┬────────────┬───────┐
    │ symbol ┆ date       ┆ price │
    │ ---    ┆ ---        ┆ ---   │
    │ str    ┆ date       ┆ i64   │
    ╞════════╪════════════╪═══════╡
    │ sec1   ┆ 2021-09-17 ┆ 3     │
    │ sec1   ┆ 2021-08-31 ┆ 4     │
    │ sec1   ┆ 2020-12-31 ┆ 5     │
    │ sec2   ┆ 2021-09-15 ┆ 7     │
    │ sec2   ┆ 2021-08-31 ┆ 8     │
    └────────┴────────────┴───────┘