pythonsplitpython-polars

Polars split column and get n-th (or last) element


I have the following code and output.

Code.

import polars as pl

df = pl.DataFrame({
    'type': ['A', 'O', 'B', 'O'],
    'id':   ['CASH', 'ORB.A123', 'CHECK', 'OTC.BV32']
})

df.with_columns(sub_id=pl.when(pl.col('type') == 'O').then(pl.col('id').str.split('.')).otherwise(None))

Output.

shape: (4, 3)
┌──────┬──────────┬─────────────────┐
│ type ┆ id       ┆ sub_id          │
│ ---  ┆ ---      ┆ ---             │
│ str  ┆ str      ┆ list[str]       │
╞══════╪══════════╪═════════════════╡
│ A    ┆ CASH     ┆ null            │
│ O    ┆ ORB.A123 ┆ ["ORB", "A123"] │
│ B    ┆ CHECK    ┆ null            │
│ O    ┆ OTC.BV32 ┆ ["OTC", "BV32"] │
└──────┴──────────┴─────────────────┘

Now, how would I extract the n-th element (or in this case, the last element) of each list?

Especially, the expected output is the following.

shape: (4, 3)
┌──────┬──────────┬────────────┐
│ type ┆ id       ┆ sub_id     │
│ ---  ┆ ---      ┆ ---        │
│ str  ┆ str      ┆ str        │
╞══════╪══════════╪════════════╡
│ A    ┆ CASH     ┆ null       │
│ O    ┆ ORB.A123 ┆ "A123"     │
│ B    ┆ CHECK    ┆ null       │
│ O    ┆ OTC.BV32 ┆ "BV32"     │
└──────┴──────────┴────────────┘

Solution

  • You can simply append .list.last() to select the last element of each list.

    Alternatively, there exists .list.get() to get list elements by index.

    import polars as pl
    
    df = pl.DataFrame({
        'type': ['A', 'O', 'B', 'O'],
        'id':   ['CASH', 'ORB.A123', 'CHECK', 'OTC.BV32']
    })
    
    df.with_columns(
        sub_id=pl.when(
            pl.col('type') == 'O'
        ).then(
            pl.col('id').str.split('.').list.last()
        )
    )
    
    shape: (4, 3)
    ┌──────┬──────────┬────────┐
    │ type ┆ id       ┆ sub_id │
    │ ---  ┆ ---      ┆ ---    │
    │ str  ┆ str      ┆ str    │
    ╞══════╪══════════╪════════╡
    │ A    ┆ CASH     ┆ null   │
    │ O    ┆ ORB.A123 ┆ A123   │
    │ B    ┆ CHECK    ┆ null   │
    │ O    ┆ OTC.BV32 ┆ BV32   │
    └──────┴──────────┴────────┘
    

    Note that I've dropped .otherwise(None) as this is the default behaviour of an if-then-otherwise expression.