In pandas
it is possible to broadcast a single value to an entire column or even a slice:
frame.loc[start_index:stop_index, 'a'] = frame.loc[some_row_index, 'a']
that is, a single value being broadcast to a Series
.
I tried something similar with polars
by doing
frame = frame.with_columns(
pl.when(
pl.col("Time").is_between(datetime(2022, 4, 21), datetime(2022, 4, 23))
)
.then(
pl.lit(
frame.filter(pl.col("Time") == datetime(2022, 4, 20)).select(
"col"
)
)
)
.otherwise(pl.col("col"))
.alias("col")
)
but I get the following error:
ValueError: could not convert value 'shape: (1, 1)\n┌────────┐\n│ col │\n│ --- │\n│ i64 │\n╞════════╡\n│ 14 │\n└────────┘' as a Literal
If i just use an integer like pl.lit(6)
in the assignment it works fine though. How can i broadcast a single cell value to a column or a slice of a column?
Edit: Ok, so apparently indexing into the shape(1,1) DataFrame like so
frame.filter(pl.col("Time") == datetime(2022, 4, 20)).select("col")[0,0]
and casting the result to a literal asf. works but given that the documentation is rather verbose about not using square bracket notation, is there perhaps a better way?
You are quite right to avoid square bracket notation. A better way to do this would be to extract the single value you want as a Series. Polars will broadcast the single value in a when/then/otherwise
.
For instance, let's start with this data:
from datetime import datetime
import polars as pl
df = pl.DataFrame(
{
"Time": pl.date_range(datetime(2022, 4, 18), datetime(2022, 4, 25), "1d", eager=True),
'value': pl.int_range(0, 8, eager=True),
}
)
df
shape: (8, 2)
┌────────────┬───────┐
│ Time ┆ value │
│ --- ┆ --- │
│ date ┆ i64 │
╞════════════╪═══════╡
│ 2022-04-18 ┆ 0 │
│ 2022-04-19 ┆ 1 │
│ 2022-04-20 ┆ 2 │
│ 2022-04-21 ┆ 3 │
│ 2022-04-22 ┆ 4 │
│ 2022-04-23 ┆ 5 │
│ 2022-04-24 ┆ 6 │
│ 2022-04-25 ┆ 7 │
└────────────┴───────┘
After filtering, I'll extract the value as a series, using the get_column
method.
s = df.filter(pl.col("Time") == datetime(2022, 4, 20)).get_column('value')
s
shape: (1,)
Series: 'value' [i64]
[
2
]
Notice above that this approach preserves the datatype of your column (i64). By contrast, using square bracket notation will convert the Polars value to a Python object, which Polars must then map back to a Polars datatype in the when/then/otherwise
. (This sometimes leads to problems.)
In the when/then/otherwise
, Polars will broadcast the single value in s:
(
df
.with_columns(
pl.when(pl.col("Time").is_between(
datetime(2022, 4, 22),
datetime(2022, 4, 24),
))
.then(s)
.otherwise(pl.col("value"))
.alias("result")
)
)
shape: (8, 3)
┌────────────┬───────┬────────┐
│ Time ┆ value ┆ result │
│ --- ┆ --- ┆ --- │
│ date ┆ i64 ┆ i64 │
╞════════════╪═══════╪════════╡
│ 2022-04-18 ┆ 0 ┆ 0 │
│ 2022-04-19 ┆ 1 ┆ 1 │
│ 2022-04-20 ┆ 2 ┆ 2 │
│ 2022-04-21 ┆ 3 ┆ 3 │
│ 2022-04-22 ┆ 4 ┆ 2 │
│ 2022-04-23 ┆ 5 ┆ 2 │
│ 2022-04-24 ┆ 6 ┆ 2 │
│ 2022-04-25 ┆ 7 ┆ 7 │
└────────────┴───────┴────────┘