I am trying to create a Polars DataFrame that includes a column of structs based on another DataFrame column. Here's the setup:
import polars as pl
df = pl.DataFrame(
[
pl.Series("start", ["2023-01-01"], dtype=pl.Date).str.to_date(),
pl.Series("end", ["2024-01-01"], dtype=pl.Date).str.to_date(),
]
)
shape: (1, 2)
┌────────────┬────────────┐
│ start ┆ end │
│ --- ┆ --- │
│ date ┆ date │
╞════════════╪════════════╡
│ 2023-01-01 ┆ 2024-01-01 │
└────────────┴────────────┘
df = df.with_columns(
pl.date_range(pl.col("start"), pl.col("end"), "1mo", closed="left")
.implode()
.alias("date_range")
)
shape: (1, 3)
┌────────────┬────────────┬─────────────────────────────────┐
│ start ┆ end ┆ date_range │
│ --- ┆ --- ┆ --- │
│ date ┆ date ┆ list[date] │
╞════════════╪════════════╪═════════════════════════════════╡
│ 2023-01-01 ┆ 2024-01-01 ┆ [2023-01-01, 2023-02-01, … 202… │
└────────────┴────────────┴─────────────────────────────────┘
Now, I want to make a struct out of the year/month parts:
df = df.with_columns(
pl.col("date_range")
.list.eval(
pl.struct(
{
"year": pl.element().dt.year(),
"month": pl.element().dt.month(),
}
)
)
.alias("years_months")
)
But this does not work.
Maybe I ought not to implode
the date_range
's output into a list, but I am not sure how to create a struct directly from its result either.
My best idea is one I don't like because I have to repeatedly call pl.list.eval
:
df = (
df.with_columns(
pl.col("date_range").list.eval(pl.element().dt.year()).alias("year"),
pl.col("date_range").list.eval(pl.element().dt.month()).alias("month"),
)
.drop("start", "end", "date_range")
.explode("year", "month")
.select(pl.struct("year", "month"))
)
df
The other idea is to use map_elements
, but I think that ought to be something of a last resort. What's the idiomatic way to eval into a struct?
You shouldn't be passing a dictionary into the struct constructor. Just pass each IntoExpr
as a keyword argument - that is, pl.struct(key1=IntoExprA, key2=IntoExprB, ...)
.
In your case, you could replace your struct-ification code with the following:
df = df.with_columns(
pl.col("date_range")
.list.eval(
pl.struct(
year = pl.element().dt.year(),
month = pl.element().dt.month()
)
)
.alias("years_months")
)
Printing df
gave me this output:
┌────────────┬────────────┬─────────────────────────────────┬─────────────────────────────────┐
│ start ┆ end ┆ date_range ┆ years_months │
│ --- ┆ --- ┆ --- ┆ --- │
│ date ┆ date ┆ list[date] ┆ list[struct[2]] │
╞════════════╪════════════╪═════════════════════════════════╪═════════════════════════════════╡
│ 2023-01-01 ┆ 2024-01-01 ┆ [2023-01-01, 2023-02-01, … 202… ┆ [{2023,1}, {2023,2}, … {2023,1… │
└────────────┴────────────┴─────────────────────────────────┴─────────────────────────────────┘
If your dictionary was constructed earlier in the program, you can use unpacking to get the kwargs back out:
df = df.with_columns(
pl.col("date_range")
.list.eval(
pl.struct(**example_dict)
)
.alias("years_months")
)