python-polars

Broadcast a single cell value to a column


In pandas it is possible to broadcast a single value to an entire column or even a slice:

frame.loc[start_index:stop_index, 'a'] = frame.loc[some_row_index, 'a']

that is, a single value being broadcast to a Series.

I tried something similar with polars by doing

frame = frame.with_columns(
    pl.when(
        pl.col("Time").is_between(datetime(2022, 4, 21), datetime(2022, 4, 23))
    )
    .then(
        pl.lit(
            frame.filter(pl.col("Time") == datetime(2022, 4, 20)).select(
                "col"
            )
        )
    )
    .otherwise(pl.col("col"))
    .alias("col")
)

but I get the following error:

ValueError: could not convert value 'shape: (1, 1)\n┌────────┐\n│ col │\n│ ---    │\n│ i64    │\n╞════════╡\n│ 14     │\n└────────┘' as a Literal

If i just use an integer like pl.lit(6) in the assignment it works fine though. How can i broadcast a single cell value to a column or a slice of a column?


Edit: Ok, so apparently indexing into the shape(1,1) DataFrame like so

frame.filter(pl.col("Time") == datetime(2022, 4, 20)).select("col")[0,0]

and casting the result to a literal asf. works but given that the documentation is rather verbose about not using square bracket notation, is there perhaps a better way?


Solution

  • You are quite right to avoid square bracket notation. A better way to do this would be to extract the single value you want as a Series. Polars will broadcast the single value in a when/then/otherwise.

    For instance, let's start with this data:

    from datetime import datetime
    import polars as pl
    
    df = pl.DataFrame(
        {
            "Time": pl.date_range(datetime(2022, 4, 18), datetime(2022, 4, 25), "1d", eager=True),
            'value': pl.int_range(0, 8, eager=True),
        }
    )
    df
    
    shape: (8, 2)
    ┌────────────┬───────┐
    │ Time       ┆ value │
    │ ---        ┆ ---   │
    │ date       ┆ i64   │
    ╞════════════╪═══════╡
    │ 2022-04-18 ┆ 0     │
    │ 2022-04-19 ┆ 1     │
    │ 2022-04-20 ┆ 2     │
    │ 2022-04-21 ┆ 3     │
    │ 2022-04-22 ┆ 4     │
    │ 2022-04-23 ┆ 5     │
    │ 2022-04-24 ┆ 6     │
    │ 2022-04-25 ┆ 7     │
    └────────────┴───────┘
    

    After filtering, I'll extract the value as a series, using the get_column method.

    s = df.filter(pl.col("Time") == datetime(2022, 4, 20)).get_column('value')
    s
    
    shape: (1,)
    Series: 'value' [i64]
    [
            2
    ]
    

    Notice above that this approach preserves the datatype of your column (i64). By contrast, using square bracket notation will convert the Polars value to a Python object, which Polars must then map back to a Polars datatype in the when/then/otherwise. (This sometimes leads to problems.)

    In the when/then/otherwise, Polars will broadcast the single value in s:

    (
        df
        .with_columns(
            pl.when(pl.col("Time").is_between(
                datetime(2022, 4, 22),
                datetime(2022, 4, 24),
            ))
            .then(s)
            .otherwise(pl.col("value"))
            .alias("result")
        )
    )
    
    shape: (8, 3)
    ┌────────────┬───────┬────────┐
    │ Time       ┆ value ┆ result │
    │ ---        ┆ ---   ┆ ---    │
    │ date       ┆ i64   ┆ i64    │
    ╞════════════╪═══════╪════════╡
    │ 2022-04-18 ┆ 0     ┆ 0      │
    │ 2022-04-19 ┆ 1     ┆ 1      │
    │ 2022-04-20 ┆ 2     ┆ 2      │
    │ 2022-04-21 ┆ 3     ┆ 3      │
    │ 2022-04-22 ┆ 4     ┆ 2      │
    │ 2022-04-23 ┆ 5     ┆ 2      │
    │ 2022-04-24 ┆ 6     ┆ 2      │
    │ 2022-04-25 ┆ 7     ┆ 7      │
    └────────────┴───────┴────────┘