dataframepython-polars

How to filter df by value list with Polars?


I have Polars df from a csv and I try to filter it by value list:

list = [1, 2, 4, 6, 48]

df = (
    pl.read_csv("bm.dat", separator=';', new_columns=["cid1", "cid2", "cid3"])
    .lazy()
    .filter((pl.col("cid1") in list) & (pl.col("cid2") in list))
    .collect()
)

I receive an error:

ValueError: Since Expr are lazy, the truthiness of an Expr is ambiguous. Hint: use '&' or '|' to chain Expr together, not and/or.

But when I comment #.lazy() and #.collect(), I receive this error again.

I tried only one filter .filter(pl.col("cid1") in list, and received the error again.

How to filter df by value list with Polars?


Solution

  • Your error relates to using the in operator. In Polars, you want to use the is_in Expression.

    For example:

    df = pl.DataFrame(
        {
            "cid1": [1, 2, 3],
            "cid2": [4, 5, 6],
            "cid3": [7, 8, 9],
        }
    )
    
    
    list = [1, 2, 4, 6, 48]
    (
        df.lazy()
        .filter((pl.col("cid1").is_in(list)) & (pl.col("cid2").is_in(list)))
        .collect()
    )
    
    shape: (1, 3)
    ┌──────┬──────┬──────┐
    │ cid1 ┆ cid2 ┆ cid3 │
    │ ---  ┆ ---  ┆ ---  │
    │ i64  ┆ i64  ┆ i64  │
    ╞══════╪══════╪══════╡
    │ 1    ┆ 4    ┆ 7    │
    └──────┴──────┴──────┘
    

    But if we attempt to use the in operator instead, we get our error again.

    (
        df.lazy()
        .filter((pl.col("cid1") in list) & (pl.col("cid2") in list))
        .collect()
    )
    
    Traceback (most recent call last):
      File "<stdin>", line 3, in <module>
      File "/home/corey/.virtualenvs/StackOverflow/lib/python3.10/site-packages/polars/internals/expr/expr.py", line 155, in __bool__
        raise ValueError(
    ValueError: Since Expr are lazy, the truthiness of an Expr is ambiguous. Hint: use '&' or '|' to chain Expr together, not and/or.