pythonpython-polarspolars

How can I iterate over all columns using pl.all() in Polars?


I've written a custom function in Polars to generate a horizontal forward/backward fill list of expressions. The function accepts an iterable of expressions (or column names) to determine the order of filling. I want to to use all columns via pl.all() as default. The problem is that pl.all() returns a single expression rather than an iterable, so trying to reverse or iterate over it leads to a TypeError.

Is there a way to convert between single expressions and iterables of expressions? Any suggestions or workarounds are greatly appreciated!

Here is the function:

from typing import Iterable
from polars._typing import IntoExpr
import polars as pl

def fill_horizontal(exprs: Iterable[IntoExpr], forward: bool = True) -> list[pl.Expr]:
    """Generate a horizontal forward/backward fill list of expressions."""
    # exprs = exprs or pl.all()  # use all columns as default
    cols = [col for col in reversed(exprs)] if forward else exprs
    return [pl.coalesce(cols[i:]) for i in range(0, len(cols) - 1)]

Here is an example:

df = pl.DataFrame({
    "col1": [1, None, 2],
    "col2": [1, 2, None],
    "col3": [None, None, 3]})
print(df)
# shape: (3, 3)
# ┌──────┬──────┬──────┐
# │ col1 ┆ col2 ┆ col3 │
# │ ---  ┆ ---  ┆ ---  │
# │ i64  ┆ i64  ┆ i64  │
# ╞══════╪══════╪══════╡
# │ 1    ┆ 1    ┆ null │
# │ null ┆ 2    ┆ null │
# │ 2    ┆ null ┆ 3    │
# └──────┴──────┴──────┘
print('forward_fill')
print(df.with_columns(fill_horizontal(df.columns, forward=True)))
# shape: (3, 3)
# ┌──────┬──────┬──────┐
# │ col1 ┆ col2 ┆ col3 │
# │ ---  ┆ ---  ┆ ---  │
# │ i64  ┆ i64  ┆ i64  │
# ╞══════╪══════╪══════╡
# │ 1    ┆ 1    ┆ 1    │
# │ null ┆ 2    ┆ 2    │
# │ 2    ┆ 2    ┆ 3    │
# └──────┴──────┴──────┘
print('backward_fill')
print(df.with_columns(fill_horizontal(df.columns, forward=False)))
# shape: (3, 3)
# ┌──────┬──────┬──────┐
# │ col1 ┆ col2 ┆ col3 │
# │ ---  ┆ ---  ┆ ---  │
# │ i64  ┆ i64  ┆ i64  │
# ╞══════╪══════╪══════╡
# │ 1    ┆ 1    ┆ null │
# │ 2    ┆ 2    ┆ null │
# │ 2    ┆ 3    ┆ 3    │
# └──────┴──────┴──────┘

Edit: Merging @Henry Harbeck's answer and @jqurious's comment seems to be not perfect but a sufficient solution as of now.

def fill_horizontal(
        exprs: Iterable[IntoExpr] | None = None,
        *,
        forward: bool = True,
        ncols: int = 1000) -> pl.Expr:
    """Generate a horizontal forward/backward fill expression."""
    if exprs is None:
        # if forward is false, ncols has to be defined with the present number of cols or more
        cols = pl.all() if forward else pl.nth(range(ncols, -1, -1))
    else:
        cols = exprs if forward else reversed(exprs)
    return pl.cum_reduce(lambda s1, s2: pl.coalesce(s2, s1), cols).struct.unnest()

Solution

  • Check out cum_reduce, which does a cumulative horizontal reduction. This is pretty much what you are after and saves you having to do any Python looping.

    Unfortunately, it reduces from left to right only. I've made this feature request to ask for right to left reductions, which should fully enable your use-case.

    Here's a tweaked version of your function that works in a cases except pl.all() and forward=False

    def fill_horizontal(
        exprs: Iterable[IntoExpr] | None = None,
        *,
        forward: bool = True
    ) -> pl.Expr:
        """Generate a horizontal forward/backward fill list of expressions."""
        exprs = exprs or [pl.all()]  # use all columns as default
        # Doesn't do anything for pl.all() - columns still remain in their original order 
        cols = exprs if forward else reversed(exprs)
        return pl.cum_reduce(lambda s1, s2: pl.coalesce(s2, s1), cols).struct.unnest()
    
    df.with_columns(fill_horizontal())
    # shape: (3, 3)
    # ┌──────┬──────┬──────┐
    # │ col1 ┆ col2 ┆ col3 │
    # │ ---  ┆ ---  ┆ ---  │
    # │ i64  ┆ i64  ┆ i64  │
    # ╞══════╪══════╪══════╡
    # │ 1    ┆ 1    ┆ 1    │
    # │ null ┆ 2    ┆ 2    │
    # │ 2    ┆ 2    ┆ 3    │
    # └──────┴──────┴──────┘
    
    # Doesn't work :(
    df.with_columns(fill_horizontal(forward=False))
    
    # Works as a backward fill
    df.with_columns(fill_horizontal(df.columns, forward=False))
    

    Other options I can think of are: