I've written a custom function in Polars to generate a horizontal forward/backward fill list of expressions. The function accepts an iterable of expressions (or column names) to determine the order of filling. I want to to use all columns via pl.all() as default. The problem is that pl.all() returns a single expression rather than an iterable, so trying to reverse or iterate over it leads to a TypeError.
Is there a way to convert between single expressions and iterables of expressions? Any suggestions or workarounds are greatly appreciated!
Here is the function:
from typing import Iterable
from polars._typing import IntoExpr
import polars as pl
def fill_horizontal(exprs: Iterable[IntoExpr], forward: bool = True) -> list[pl.Expr]:
"""Generate a horizontal forward/backward fill list of expressions."""
# exprs = exprs or pl.all() # use all columns as default
cols = [col for col in reversed(exprs)] if forward else exprs
return [pl.coalesce(cols[i:]) for i in range(0, len(cols) - 1)]
Here is an example:
df = pl.DataFrame({
"col1": [1, None, 2],
"col2": [1, 2, None],
"col3": [None, None, 3]})
print(df)
# shape: (3, 3)
# ┌──────┬──────┬──────┐
# │ col1 ┆ col2 ┆ col3 │
# │ --- ┆ --- ┆ --- │
# │ i64 ┆ i64 ┆ i64 │
# ╞══════╪══════╪══════╡
# │ 1 ┆ 1 ┆ null │
# │ null ┆ 2 ┆ null │
# │ 2 ┆ null ┆ 3 │
# └──────┴──────┴──────┘
print('forward_fill')
print(df.with_columns(fill_horizontal(df.columns, forward=True)))
# shape: (3, 3)
# ┌──────┬──────┬──────┐
# │ col1 ┆ col2 ┆ col3 │
# │ --- ┆ --- ┆ --- │
# │ i64 ┆ i64 ┆ i64 │
# ╞══════╪══════╪══════╡
# │ 1 ┆ 1 ┆ 1 │
# │ null ┆ 2 ┆ 2 │
# │ 2 ┆ 2 ┆ 3 │
# └──────┴──────┴──────┘
print('backward_fill')
print(df.with_columns(fill_horizontal(df.columns, forward=False)))
# shape: (3, 3)
# ┌──────┬──────┬──────┐
# │ col1 ┆ col2 ┆ col3 │
# │ --- ┆ --- ┆ --- │
# │ i64 ┆ i64 ┆ i64 │
# ╞══════╪══════╪══════╡
# │ 1 ┆ 1 ┆ null │
# │ 2 ┆ 2 ┆ null │
# │ 2 ┆ 3 ┆ 3 │
# └──────┴──────┴──────┘
Edit: Merging @Henry Harbeck's answer and @jqurious's comment seems to be not perfect but a sufficient solution as of now.
def fill_horizontal(
exprs: Iterable[IntoExpr] | None = None,
*,
forward: bool = True,
ncols: int = 1000) -> pl.Expr:
"""Generate a horizontal forward/backward fill expression."""
if exprs is None:
# if forward is false, ncols has to be defined with the present number of cols or more
cols = pl.all() if forward else pl.nth(range(ncols, -1, -1))
else:
cols = exprs if forward else reversed(exprs)
return pl.cum_reduce(lambda s1, s2: pl.coalesce(s2, s1), cols).struct.unnest()
Check out cum_reduce
, which does a cumulative horizontal reduction. This is pretty much what you are after and saves you having to do any Python looping.
Unfortunately, it reduces from left to right only. I've made this feature request to ask for right to left reductions, which should fully enable your use-case.
Here's a tweaked version of your function that works in a cases except pl.all()
and forward=False
def fill_horizontal(
exprs: Iterable[IntoExpr] | None = None,
*,
forward: bool = True
) -> pl.Expr:
"""Generate a horizontal forward/backward fill list of expressions."""
exprs = exprs or [pl.all()] # use all columns as default
# Doesn't do anything for pl.all() - columns still remain in their original order
cols = exprs if forward else reversed(exprs)
return pl.cum_reduce(lambda s1, s2: pl.coalesce(s2, s1), cols).struct.unnest()
df.with_columns(fill_horizontal())
# shape: (3, 3)
# ┌──────┬──────┬──────┐
# │ col1 ┆ col2 ┆ col3 │
# │ --- ┆ --- ┆ --- │
# │ i64 ┆ i64 ┆ i64 │
# ╞══════╪══════╪══════╡
# │ 1 ┆ 1 ┆ 1 │
# │ null ┆ 2 ┆ 2 │
# │ 2 ┆ 2 ┆ 3 │
# └──────┴──────┴──────┘
# Doesn't work :(
df.with_columns(fill_horizontal(forward=False))
# Works as a backward fill
df.with_columns(fill_horizontal(df.columns, forward=False))
Other options I can think of are:
pipe
the frame in and access the schema directly. You can then reversthe columns without needing to expose this to the caller. This may block some optimisations / lazy evaluationpl.all()
and pl.col("a", "b")
df.columns
until it hopefully gets implemented