pythondataframefilterpython-polars

Filtering a list based on the values of another list in Polars


Let's say I have the following DataFrame:

df = pl.DataFrame({
    'values': [[0, 1], [9, 8]],
    'qc_flags': [["", "X"], ["T", ""]]
})

I only want to keep my values if the corresponding qc_flag equals "".

Does anyone know the correct way to go about this?

I've tried something like this:

filtered = df.with_columns(
    pl.col("values").list.eval(
        pl.element().filter(
            pl.col("qc_flags").list.eval(
                pl.element() == ""
            )
        )
    )
)

I would expect to get 'values': [[0], [8]], but then I just end up with this error:

ComputeError: named columns are not allowed in `list.eval`; consider using `element` or `col("")`


Solution

  • df.with_columns(
       filtered = pl.col.values.list.gather(
          pl.col.qc_flags.list.eval(pl.arg_where(pl.element() == ""))
       )
    )
    
    shape: (2, 3)
    ┌───────────┬───────────┬───────────┐
    │ values    ┆ qc_flags  ┆ filtered  │
    │ ---       ┆ ---       ┆ ---       │
    │ list[i64] ┆ list[str] ┆ list[i64] │
    ╞═══════════╪═══════════╪═══════════╡
    │ [0, 1]    ┆ ["", "X"] ┆ [0]       │
    │ [9, 8]    ┆ ["T", ""] ┆ [8]       │
    └───────────┴───────────┴───────────┘