pythonpython-polars

How to swap column values on conditions in python polars?


I have a data frame as below:

df = pl.DataFrame({'last_name':[None,'mallesh','bhavik'],
                   'first_name':['a','b','c'],
                   'middle_name_or_initial':['aa','bb','cc']})
shape: (3, 3)
┌───────────┬────────────┬────────────────────────┐
│ last_name ┆ first_name ┆ middle_name_or_initial │
│ ---       ┆ ---        ┆ ---                    │
│ str       ┆ str        ┆ str                    │
╞═══════════╪════════════╪════════════════════════╡
│ null      ┆ a          ┆ aa                     │
│ mallesh   ┆ b          ┆ bb                     │
│ bhavik    ┆ c          ┆ cc                     │
└───────────┴────────────┴────────────────────────┘

Here I would like to find an observation which has First and Middle Name not NULL and Last Name is Null.

In this case last_name should take the value from first_name, first_name should take the value from middle_name, and middle_name should be NULL.

expected output will be:

shape: (3, 3)
┌───────────┬────────────┬────────────────────────┐
│ last_name ┆ first_name ┆ middle_name_or_initial │
│ ---       ┆ ---        ┆ ---                    │
│ str       ┆ str        ┆ str                    │
╞═══════════╪════════════╪════════════════════════╡
│ a         ┆ aa         ┆ null                   │
│ mallesh   ┆ b          ┆ bb                     │
│ bhavik    ┆ c          ┆ cc                     │
└───────────┴────────────┴────────────────────────┘

I'm trying with this command:

df.with_columns(
    pl.when((pl.col('first_name').is_not_null()) & (pl.col('middle_name_or_initial').is_not_null()) & (pl.col('last_name').is_null()))
      .then(pl.col('first_name').alias('last_name')).otherwise(pl.col('last_name').alias('first_name')),
    pl.when((pl.col('first_name').is_not_null()) & (pl.col('middle_name_or_initial').is_not_null()) & (pl.col('last_name').is_null()))
      .then(pl.col('middle_name_or_initial').alias('first_name')).alias('middle_name_or_initial')
)

But it is not producing the expected output.


Solution

  • You can actually change the values of multiple columns within a single when/then/otherwise statement.

    The Algorithm

    name_cols = "last_name", "first_name", "middle_name_or_initial",
    (
        df.with_columns(
            pl.when(
                pl.col("first_name").is_not_null(),
                pl.col("middle_name_or_initial").is_not_null(),
                pl.col("last_name").is_null()
            )
            .then(pl.struct(
                last_name = "first_name",
                first_name = "middle_name_or_initial",
                middle_name_or_initial = None
            ))
            .otherwise(pl.struct(name_cols))
            .struct.unnest()
        )
    )
    
    shape: (3, 3)
    ┌───────────┬────────────┬────────────────────────┐
    │ last_name ┆ first_name ┆ middle_name_or_initial │
    │ ---       ┆ ---        ┆ ---                    │
    │ str       ┆ str        ┆ str                    │
    ╞═══════════╪════════════╪════════════════════════╡
    │ a         ┆ aa         ┆ null                   │
    │ mallesh   ┆ b          ┆ bb                     │
    │ bhavik    ┆ c          ┆ cc                     │
    └───────────┴────────────┴────────────────────────┘
    

    How it works

    To change the values of multiple columns within a single when/then/otherwise statement, we can use Structs.

    We use pl.struct() to create multiple fields (something as multiple sub-columns) within a single column.

    df.select(
        then=pl.struct(
            last_name = pl.col("first_name"),
            first_name = pl.col("middle_name_or_initial"),
            middle_name_or_initial = pl.lit(None),
        ),
        otherwise = pl.struct(name_cols),
    )
    
    shape: (3, 2)
    ┌─────────────────┬──────────────────────┐
    │ then            ┆ otherwise            │
    │ ---             ┆ ---                  │
    │ struct[3]       ┆ struct[3]            │
    ╞═════════════════╪══════════════════════╡
    │ {"a","aa",null} ┆ {null,"a","aa"}      │
    │ {"b","bb",null} ┆ {"mallesh","b","bb"} │
    │ {"c","cc",null} ┆ {"bhavik","c","cc"}  │
    └─────────────────┴──────────────────────┘
    

    .struct.unnest() turns each field of the Struct into its own column and thereby replaces existing columns with the fields of the Struct if existing columns and fields of the Struct have the same name.