pythonpython-polarspolars

In python-polars, how to search string across multiple columns, and create a new column of flag if string found in any of columns?


To search over multiple columns, and create a new column of flag if string found, the following codes work, but is there any compact way inside with_columns() to achieve the same?

df = pl.DataFrame({
    "col1": ["hello", "world", "polars"],
    "col2": ["data", "science", "hello"],
    "col3": ["test", "string", "match"],
    "col4": ["hello", "example", "test"]
})


search_string = "hello"

condition = pl.lit(False)


for col in df.columns:
    condition |= pl.col(col).str.contains(search_string)

df = df.with_columns(
    condition.alias("string_found") + 0
)


print(df)

shape: (3, 5)
┌────────┬─────────┬────────┬─────────┬──────────────┐
│ col1   ┆ col2    ┆ col3   ┆ col4    ┆ string_found │
│ ---    ┆ ---     ┆ ---    ┆ ---     ┆ ---          │
│ str    ┆ str     ┆ str    ┆ str     ┆ i32          │
╞════════╪═════════╪════════╪═════════╪══════════════╡
│ hello  ┆ data    ┆ test   ┆ hello   ┆ 1            │
│ world  ┆ science ┆ string ┆ example ┆ 0            │
│ polars ┆ hello   ┆ match  ┆ test    ┆ 1            │
└────────┴─────────┴────────┴─────────┴──────────────┘

Solution

  • You can use .any_horizontal()

    df.with_columns(
        pl.any_horizontal(pl.all().str.contains(search_string))
          .alias("string_found")
    )
    
    shape: (3, 5)
    ┌────────┬─────────┬────────┬─────────┬──────────────┐
    │ col1   ┆ col2    ┆ col3   ┆ col4    ┆ string_found │
    │ ---    ┆ ---     ┆ ---    ┆ ---     ┆ ---          │
    │ str    ┆ str     ┆ str    ┆ str     ┆ bool         │
    ╞════════╪═════════╪════════╪═════════╪══════════════╡
    │ hello  ┆ data    ┆ test   ┆ hello   ┆ true         │
    │ world  ┆ science ┆ string ┆ example ┆ false        │
    │ polars ┆ hello   ┆ match  ┆ test    ┆ true         │
    └────────┴─────────┴────────┴─────────┴──────────────┘
    

    You can replace pl.all() with pl.col(pl.String) to limit the expression to String columns only.

    In this example you only have String columns so it doesn't come into play.