Polars, python, how to change the number of conditions inputted when making a new column

I have large datasets (ranging from 100k - 4 million rows) where I am looking for different relevant codes across multiple columns. For example, if I wanted to identify each row which has some start to a string '302' I would do:

import polars as pl

df = pl.DataFrame({
'Codes_1': ['302E513', '301E513', '302E512'],
'Codes_2': ['303E513', '306E510', '302E512']}).lazy()

conditions = ['302E513', '306E510']
column_names = ['Codes_1', 'Codes_2']

#create new column
df = df.with_columns(
   pl.when(pl.any_horizontal( 
       pl.col(column_names).str.starts_with(conditions[0]),
       pl.col(column_names).str.starts_with(conditions[1])))
   .then(1.0)
   .otherwise(0.0)
   .alias('Column_name')
)

It is really annoying when I am looking for say 4 codes instead of 2 to have to type in each of the codes to form my new column:

import polars as pl

df = pl.DataFrame({
'Codes_1': ['302E513', '301E513', '302E512'],
'Codes_2': ['303E513', '306E510', '302E512']}).lazy()

conditions = ['302E513', '306E510', '5164E23', '302E514']
column_names = ['Codes_1', 'Codes_2']

#create new column
df = df.with_columns(
   pl.when(pl.any_horizontal(
       #Tedious part 
       pl.col(column_names).str.starts_with(conditions[0]),
       pl.col(column_names).str.starts_with(conditions[1]),
       pl.col(column_names).str.starts_with(conditions[2]),
       pl.col(column_names).str.starts_with(conditions[3])
))
   .then(1.0)
   .otherwise(0.0)
   .alias('Column_name')
)

I know that this can be done with pandas by updating a mask with a for loop

import pandas as pd

df = pd.DataFrame({
'Codes_1': ['302E513', '301E513', '302E512'],
'Codes_2': ['303E513', '306E510', '302E512']})

conditions = ['302E513', '306E510']
column_names = ['Codes_1', 'Codes_2']

#loop to create new column
mask = False
for code in conditions:
   mask |= df[column_names].eq(code).any(axis=1)

df['Column_name'] = 0.0
df.loc[mask, 'Column_name'] = 1.0
print(df['Column_name'])

And I could change the number of conditions to any number and this code would execute. However, I would much rather use polars as it is faster and does not overflow the RAM on my machine for larger datasets. Any help is appreciated.

Solution

You could replace the multiple str.starts_with with a single regex and str.contains:

df.with_columns(
   pl.when(pl.any_horizontal(
       pl.col(column_names).str.contains(f"^({'|'.join(conditions)})"),
))
   .then(1.0)
   .otherwise(0.0)
   .alias('Column_name')
)

Or use a loop:

df.with_columns(
   pl.when(pl.any_horizontal(
       pl.col(column_names).str.starts_with(c)
        for c in conditions
))
   .then(1.0)
   .otherwise(0.0)
   .alias('Column_name')
)

Intermediate:

# f"^({'|'.join(conditions)})"
'^(302E513|306E510|5164E23|302E514)'

Output (non-lazy):

┌─────────┬─────────┬─────────────┐
│ Codes_1 ┆ Codes_2 ┆ Column_name │
│ ---     ┆ ---     ┆ ---         │
│ str     ┆ str     ┆ f64         │
╞═════════╪═════════╪═════════════╡
│ 302E513 ┆ 303E513 ┆ 1.0         │
│ 301E513 ┆ 306E510 ┆ 1.0         │
│ 302E512 ┆ 302E512 ┆ 0.0         │
└─────────┴─────────┴─────────────┘