In polars select all column ending with pattern and add new columns without pattern

I have the following dataframe:

import polars as pl
import numpy as np

df = pl.DataFrame({
    "nrs": [1, 2, 3, None, 5],
    "names_A0": ["foo", "ham", "spam", "egg", None],
    "random_A0": np.random.rand(5),
    "A_A2": [True, True, False, False, False],
})
digit = 0

For each column X whose name ends with the string suf =f'_A{digit}', I want to add an identical column to df, whose name is the same as X, but without suf.

In the example, I need to add columns names and random to the original dataframe df, whose content is identical to that of columns names_A0 and random_A0 respectively.

Solution

You can you Polars Selectors along with some basic strings operations to accomplish this. Depending on what you how you expect this problem to evolve, you can jump straight to regular expressions, or use polars.selectors.ends_with/string.removesuffix

String Suffix Operations

This approach uses

- polars.selectors.ends_with # find columns ending with string
- string.removesuffix        # remove suffix from end of string

translating to

import polars as pl
from polars import selectors as cs
import numpy as np
import re
from functools import partial

df = pl.DataFrame(
    {
        "nrs": [1, 2, 3, None, 5],
        "names_A0": ["foo", "ham", "spam", "egg", None],
        "random_A0": np.random.rand(5),
        "A_A2": [True, True, False, False, False],
    }
)
digit = 0
suffix = f'_A{digit}'

print(
    # keep original A0 columns
    df.with_columns(
        cs.ends_with(suffix).name.map(lambda s: s.removesuffix(suffix))
    ),
    # shape: (5, 6)
    # ┌──────┬──────────┬───────────┬───────┬───────┬──────────┐
    # │ nrs  ┆ names_A0 ┆ random_A0 ┆ A_A2  ┆ names ┆ random   │
    # │ ---  ┆ ---      ┆ ---       ┆ ---   ┆ ---   ┆ ---      │
    # │ i64  ┆ str      ┆ f64       ┆ bool  ┆ str   ┆ f64      │
    # ╞══════╪══════════╪═══════════╪═══════╪═══════╪══════════╡
    # │ 1    ┆ foo      ┆ 0.713324  ┆ true  ┆ foo   ┆ 0.713324 │
    # │ 2    ┆ ham      ┆ 0.980031  ┆ true  ┆ ham   ┆ 0.980031 │
    # │ 3    ┆ spam     ┆ 0.242768  ┆ false ┆ spam  ┆ 0.242768 │
    # │ null ┆ egg      ┆ 0.528783  ┆ false ┆ egg   ┆ 0.528783 │
    # │ 5    ┆ null     ┆ 0.583206  ┆ false ┆ null  ┆ 0.583206 │
    # └──────┴──────────┴───────────┴───────┴───────┴──────────┘


    # drop original A0 columns
    df.select(
        ~cs.ends_with(suffix),
        cs.ends_with(suffix).name.map(lambda s: s.removesuffix(suffix))
    ),
    # shape: (5, 4)
    # ┌──────┬───────┬───────┬──────────┐
    # │ nrs  ┆ A_A2  ┆ names ┆ random   │
    # │ ---  ┆ ---   ┆ ---   ┆ ---      │
    # │ i64  ┆ bool  ┆ str   ┆ f64      │
    # ╞══════╪═══════╪═══════╪══════════╡
    # │ 1    ┆ true  ┆ foo   ┆ 0.713324 │
    # │ 2    ┆ true  ┆ ham   ┆ 0.980031 │
    # │ 3    ┆ false ┆ spam  ┆ 0.242768 │
    # │ null ┆ false ┆ egg   ┆ 0.528783 │
    # │ 5    ┆ false ┆ null  ┆ 0.583206 │
    # └──────┴───────┴───────┴──────────┘

    sep='\n\n'
)

Regular Expressions

Alternatively you can use regular expressions to detect a range of suffix patterns

- polars.selectors.matches  # find columns matching a pattern
- re.sub                    # substitute in string based on pattern

We will need to ensure our pattern ends with a '$' to anchor the pattern to the end of the string.

import polars as pl
from polars import selectors as cs
import numpy as np
import re
from functools import partial

df = pl.DataFrame(
    {
        "nrs": [1, 2, 3, None, 5],
        "names_A0": ["foo", "ham", "spam", "egg", None],
        "random_A0": np.random.rand(5),
        "A_A2": [True, True, False, False, False],
    }
)
digit=0
suffix = fr'_A{digit}$'

print(
    # keep original A0 columns
    df.with_columns(
        cs.matches(suffix).name.map(lambda s: re.sub(suffix, '', s))
    ),
    # shape: (5, 6)
    # ┌──────┬──────────┬───────────┬───────┬───────┬──────────┐
    # │ nrs  ┆ names_A0 ┆ random_A0 ┆ A_A2  ┆ names ┆ random   │
    # │ ---  ┆ ---      ┆ ---       ┆ ---   ┆ ---   ┆ ---      │
    # │ i64  ┆ str      ┆ f64       ┆ bool  ┆ str   ┆ f64      │
    # ╞══════╪══════════╪═══════════╪═══════╪═══════╪══════════╡
    # │ 1    ┆ foo      ┆ 0.713324  ┆ true  ┆ foo   ┆ 0.713324 │
    # │ 2    ┆ ham      ┆ 0.980031  ┆ true  ┆ ham   ┆ 0.980031 │
    # │ 3    ┆ spam     ┆ 0.242768  ┆ false ┆ spam  ┆ 0.242768 │
    # │ null ┆ egg      ┆ 0.528783  ┆ false ┆ egg   ┆ 0.528783 │
    # │ 5    ┆ null     ┆ 0.583206  ┆ false ┆ null  ┆ 0.583206 │
    # └──────┴──────────┴───────────┴───────┴───────┴──────────┘


    # drop original A0 columns
    df.select(
        ~cs.matches(suffix),
        cs.matches(suffix).name.map(lambda s: re.sub(suffix, '', s))
    ),
    # shape: (5, 4)
    # ┌──────┬───────┬───────┬──────────┐
    # │ nrs  ┆ A_A2  ┆ names ┆ random   │
    # │ ---  ┆ ---   ┆ ---   ┆ ---      │
    # │ i64  ┆ bool  ┆ str   ┆ f64      │
    # ╞══════╪═══════╪═══════╪══════════╡
    # │ 1    ┆ true  ┆ foo   ┆ 0.713324 │
    # │ 2    ┆ true  ┆ ham   ┆ 0.980031 │
    # │ 3    ┆ false ┆ spam  ┆ 0.242768 │
    # │ null ┆ false ┆ egg   ┆ 0.528783 │
    # │ 5    ┆ false ┆ null  ┆ 0.583206 │
    # └──────┴───────┴───────┴──────────┘

    sep='\n\n'
)