pythonpython-polars

How can I create a Polars struct while eval-ing a list?


I am trying to create a Polars DataFrame that includes a column of structs based on another DataFrame column. Here's the setup:

import polars as pl

df = pl.DataFrame(
    [
        pl.Series("start", ["2023-01-01"], dtype=pl.Date).str.to_date(),
        pl.Series("end", ["2024-01-01"], dtype=pl.Date).str.to_date(),
    ]
)
shape: (1, 2)
┌────────────┬────────────┐
│ start      ┆ end        │
│ ---        ┆ ---        │
│ date       ┆ date       │
╞════════════╪════════════╡
│ 2023-01-01 ┆ 2024-01-01 │
└────────────┴────────────┘
df = df.with_columns(
    pl.date_range(pl.col("start"), pl.col("end"), "1mo", closed="left")
    .implode()
    .alias("date_range")
)
shape: (1, 3)
┌────────────┬────────────┬─────────────────────────────────┐
│ start      ┆ end        ┆ date_range                      │
│ ---        ┆ ---        ┆ ---                             │
│ date       ┆ date       ┆ list[date]                      │
╞════════════╪════════════╪═════════════════════════════════╡
│ 2023-01-01 ┆ 2024-01-01 ┆ [2023-01-01, 2023-02-01, … 202… │
└────────────┴────────────┴─────────────────────────────────┘

Now, I want to make a struct out of the year/month parts:

df = df.with_columns(
    pl.col("date_range")
    .list.eval(
        pl.struct(
            {
                "year": pl.element().dt.year(),
                "month": pl.element().dt.month(),
            }
        )
    )
    .alias("years_months")
)

But this does not work.

Maybe I ought not to implode the date_range's output into a list, but I am not sure how to create a struct directly from its result either.

My best idea is one I don't like because I have to repeatedly call pl.list.eval:

df = (
    df.with_columns(
        pl.col("date_range").list.eval(pl.element().dt.year()).alias("year"),
        pl.col("date_range").list.eval(pl.element().dt.month()).alias("month"),
    )
    .drop("start", "end", "date_range")
    .explode("year", "month")
    .select(pl.struct("year", "month"))
)
df

The other idea is to use map_elements, but I think that ought to be something of a last resort. What's the idiomatic way to eval into a struct?


Solution

  • You shouldn't be passing a dictionary into the struct constructor. Just pass each IntoExpr as a keyword argument - that is, pl.struct(key1=IntoExprA, key2=IntoExprB, ...).

    In your case, you could replace your struct-ification code with the following:

    df = df.with_columns(
        pl.col("date_range")
        .list.eval(
            pl.struct(
                year = pl.element().dt.year(),
                month = pl.element().dt.month()
            )
        )
        .alias("years_months")
    )
    

    Printing df gave me this output:

    ┌────────────┬────────────┬─────────────────────────────────┬─────────────────────────────────┐
    │ start      ┆ end        ┆ date_range                      ┆ years_months                    │
    │ ---        ┆ ---        ┆ ---                             ┆ ---                             │
    │ date       ┆ date       ┆ list[date]                      ┆ list[struct[2]]                 │
    ╞════════════╪════════════╪═════════════════════════════════╪═════════════════════════════════╡
    │ 2023-01-01 ┆ 2024-01-01 ┆ [2023-01-01, 2023-02-01, … 202… ┆ [{2023,1}, {2023,2}, … {2023,1… │
    └────────────┴────────────┴─────────────────────────────────┴─────────────────────────────────┘
    

    If your dictionary was constructed earlier in the program, you can use unpacking to get the kwargs back out:

    df = df.with_columns(
        pl.col("date_range")
        .list.eval(
            pl.struct(**example_dict)
        )
        .alias("years_months")
    )