Polars: Remove substring from column based on another column

Is there any Polars-based optimization that can be applied to the apply-lambda methodology in this post Remove substring from column based on another column ?

In the polars dataframe below, how could we remove the "_sub" substrings based on the value of sub?

import polars as pl

pl.DataFrame(
    {"origin": ["id1_COUNTRY", "id2_NAME"],
     "sub": ["COUNTRY", "NAME"]}
)

shape: (2, 2)
┌─────────────┬─────────┐
│ origin      ┆ sub     │
│ ---         ┆ ---     │
│ str         ┆ str     │
╞═════════════╪═════════╡
│ id1_COUNTRY ┆ COUNTRY │
│ id2_NAME    ┆ NAME    │
└─────────────┴─────────┘

The expected output should look like:

shape: (2, 3)
┌─────────────┬─────────┬─────┐
│ origin      ┆ sub     ┆ out │
│ ---         ┆ ---     ┆ --- │
│ str         ┆ str     ┆ str │
╞═════════════╪═════════╪═════╡
│ id1_COUNTRY ┆ COUNTRY ┆ id1 │
│ id2_NAME    ┆ NAME    ┆ id2 │
└─────────────┴─────────┴─────┘

Solution

In the given example, you are only stripping the suffix.

str.strip_suffix()

df.with_columns(
   out = pl.col("origin").str.strip_suffix("_" + pl.col("sub"))
)

shape: (2, 3)
┌─────────────┬─────────┬─────┐
│ origin      ┆ sub     ┆ out │
│ ---         ┆ ---     ┆ --- │
│ str         ┆ str     ┆ str │
╞═════════════╪═════════╪═════╡
│ id1_COUNTRY ┆ COUNTRY ┆ id1 │
│ id2_NAME    ┆ NAME    ┆ id2 │
└─────────────┴─────────┴─────┘

.replace_many() can be used for a general "substring" replacement.

df.with_columns(
   out = (pl.col("origin") + "_other")
            .str.replace_many("_" + pl.col("sub"), "")
)

shape: (2, 3)
┌─────────────┬─────────┬───────────┐
│ origin      ┆ sub     ┆ out       │
│ ---         ┆ ---     ┆ ---       │
│ str         ┆ str     ┆ str       │
╞═════════════╪═════════╪═══════════╡
│ id1_COUNTRY ┆ COUNTRY ┆ id1_other │
│ id2_NAME    ┆ NAME    ┆ id2_other │
└─────────────┴─────────┴───────────┘