If you had a list of values and a Polars dataframe with a column of text. And you wanted to filter to only the rows containing items from the list, how would you write that?
a_list = ['a', 'b', 'c']
df = pl.DataFrame({
'col1': [
'I am just a string',
'one more, but without the letters',
'we want, a, b, c,',
'Nothing here'
]
})
Expected output:
shape: (3, 1)
┌───────────────────────────────────┐
│ col1 │
│ --- │
│ str │
╞═══════════════════════════════════╡
│ I am just a string │
│ one more, but without the letter… │
│ we want, a, b, c, │
└───────────────────────────────────┘
I assume it'd have something combining/using .is_in(a_list)
and .str.contains()
, but I haven't been able to make it work.
I would use contains_any(), like:
a_list = ['a', 'b', 'c']
df = pl.DataFrame({
'col1': ['I am just a string', 'one more, but without the letters', 'we want, a, b, c,', 'Nothing here']
})
df.filter(pl.col('col1').str.contains_any(a_list))
This method is more polars-like, and easier to understand.