pythonpython-polars

polars: how to find out the number of columns in a polars expression?


I'm building a package on top of Polars, and one of the functions looks like this

def func(x: IntoExpr, y: IntoExpr):
   ...

The business logic requires that x can include multiple columns, but y must be a single column.

What should I do to check and validate this?


Solution

  • You can use the polars.selectors.expand_selector function which lets you evaluate selected columns using either selectors or simple expressions.

    Note that the drawback here is that you can’t pass in arbitrary expressions, or else the evaluation fails (see the final examples).

    import polars as pl
    import polars.selectors as cs
    from polars.selectors import expand_selector
    
    data = {
        "a1": [1, 2, 3],
        "a2": [4, 5, 6],
        "b1": [7, 8, 9],
        "b2": [10, 11, 12],
    }
    df = pl.DataFrame(data)
    
    print(
        expand_selector(df, cs.exclude('b1', 'b2')),  # ('a1', 'a2')
        expand_selector(df, cs.starts_with('b')),     # ('b1', 'b2')
        expand_selector(df, cs.matches('(a|b)1$')),   # ('a1', 'b1')
    
        # use with expressions expand_selector(..., strict=False)
        expand_selector(df, pl.exclude('a1', 'a2'), strict=False), # ('b1', 'b2')
        expand_selector(df, pl.col('b1'),           strict=False), # ('b1', )
        expand_selector(df, pl.all(),               strict=False), # ('a1', 'a2', 'b1', 'b2')
        sep='\n'
    )
    
    # anything past an arbitrary selection expression will fail
    print(expand_selector(df, pl.all() + 1, strict=False))
    
    # Traceback (most recent call last):
    #   File "/home/cameron/.vim-excerpt", line 26, in <module>
    #     expand_selector(df, pl.all() + 1, strict=False),
    #   File "/home/cameron/.pyenv/versions/dutc-site/lib/python3.10/site-packages/polars/selectors.py", line 190, in expand_selector
    #     raise TypeError(msg)
    # TypeError: expected a selector; found <Expr ['[(*) + (dyn int: 1)]'] at 0x7F835F943D30> instead.