To search over multiple columns, and create a new column of flag if string found, the following codes work, but is there any compact way inside with_columns()
to achieve the same?
df = pl.DataFrame({
"col1": ["hello", "world", "polars"],
"col2": ["data", "science", "hello"],
"col3": ["test", "string", "match"],
"col4": ["hello", "example", "test"]
})
search_string = "hello"
condition = pl.lit(False)
for col in df.columns:
condition |= pl.col(col).str.contains(search_string)
df = df.with_columns(
condition.alias("string_found") + 0
)
print(df)
shape: (3, 5)
┌────────┬─────────┬────────┬─────────┬──────────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ string_found │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str ┆ str ┆ i32 │
╞════════╪═════════╪════════╪═════════╪══════════════╡
│ hello ┆ data ┆ test ┆ hello ┆ 1 │
│ world ┆ science ┆ string ┆ example ┆ 0 │
│ polars ┆ hello ┆ match ┆ test ┆ 1 │
└────────┴─────────┴────────┴─────────┴──────────────┘
You can use .any_horizontal()
df.with_columns(
pl.any_horizontal(pl.all().str.contains(search_string))
.alias("string_found")
)
shape: (3, 5)
┌────────┬─────────┬────────┬─────────┬──────────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ string_found │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str ┆ str ┆ bool │
╞════════╪═════════╪════════╪═════════╪══════════════╡
│ hello ┆ data ┆ test ┆ hello ┆ true │
│ world ┆ science ┆ string ┆ example ┆ false │
│ polars ┆ hello ┆ match ┆ test ┆ true │
└────────┴─────────┴────────┴─────────┴──────────────┘
You can replace pl.all()
with pl.col(pl.String)
to limit the expression to String columns only.
In this example you only have String columns so it doesn't come into play.