pythondataframedata-cleaningpython-polars

How to combine two columns into `{key:value}` pairs in polars?


I'm working with a Polars DataFrame, and I want to combine two columns into a dictionary format, where the values from one column become the keys and the values from the other column become the corresponding values.

Here's an example DataFrame:

import polars as pl

df = pl.DataFrame({
    "name": ["Chuck", "John", "Alice"],
    "surname": ["Dalliston", "Doe", "Smith"]
})

I want to transform this DataFrame into a new column that contains dictionaries, where name is the key and surname is the value. The expected outcome should look like this:

shape: (3, 3)
┌───────┬─────────┬──────────────────────────┐
│ name  │ surname │ name_surname             │
│ ---   │ ---     │ ---                      │
│ str   │ str     │ dict[str, str]           │
├───────┼─────────┼──────────────────────────┤
│ Chuck │ Dalliston│ {"Chuck": "Dalliston"}   │
│ John  │ Doe     │ {"John": "Doe"}          │
│ Alice │ Smith   │ {"Alice": "Smith"}       │
└───────┴─────────┴──────────────────────────┘

I've tried the following code:

df.with_columns(
    json = pl.struct("name", "surname").map_elements(json.dumps)
)

But the result is not as expected. Instead of creating a dictionary with key-value, it produces:

{name:Chuck,surname:Dalliston}

Solution

  • You can try this code snippet, This seems to be the closest you can get has pl does not have a naive dict.

    See reference : data_types_polaris

    import polars as pl
    
    df = pl.DataFrame(
        {"name": ["Chuck", "John", "Alice"], "surname": ["Dalliston", "Doe", "Smith"]}
    )
    
    df = df.select(
        [
            "name",
            "surname",
            (
                pl.struct(["name", "surname"]).map_elements(
                    lambda row: {row["name"]: row["surname"]}, return_dtype=pl.Object
                )
            ).alias("name_surname"),
        ]
    )
    print(df)
    
    ┌───────┬───────────┬────────────────────────┐
    │ name  ┆ surname   ┆ name_surname           │
    │ ---   ┆ ---       ┆ ---                    │
    │ str   ┆ str       ┆ object                 │
    ╞═══════╪═══════════╪════════════════════════╡
    │ Chuck ┆ Dalliston ┆ {'Chuck': 'Dalliston'} │
    │ John  ┆ Doe       ┆ {'John': 'Doe'}        │
    │ Alice ┆ Smith     ┆ {'Alice': 'Smith'}     │
    └───────┴───────────┴────────────────────────┘