containspython-polarsisin

How do I filter to rows of strings that contain a value from a list in Polars


If you had a list of values and a Polars dataframe with a column of text. And you wanted to filter to only the rows containing items from the list, how would you write that?

a_list = ['a', 'b', 'c']

df = pl.DataFrame({
    'col1': [
        'I am just a string', 
        'one more, but without the letters', 
        'we want, a, b, c,', 
        'Nothing here'
    ]
})

Expected output:

shape: (3, 1)
┌───────────────────────────────────┐
│ col1                              │
│ ---                               │
│ str                               │
╞═══════════════════════════════════╡
│ I am just a string                │
│ one more, but without the letter… │
│ we want, a, b, c,                 │
└───────────────────────────────────┘

I assume it'd have something combining/using .is_in(a_list) and .str.contains(), but I haven't been able to make it work.


Solution

  • I would use contains_any(), like:

    a_list = ['a', 'b', 'c']
    
    df = pl.DataFrame({
        'col1': ['I am just a string', 'one more, but without the letters', 'we want, a, b, c,', 'Nothing here']
    })
    
    df.filter(pl.col('col1').str.contains_any(a_list))
    

    This method is more polars-like, and easier to understand.