I'm working with a Polars DataFrame
, and I want to combine two columns into a dictionary format, where the values from one column become the keys and the values from the other column become the corresponding values.
Here's an example DataFrame:
import polars as pl
df = pl.DataFrame({
"name": ["Chuck", "John", "Alice"],
"surname": ["Dalliston", "Doe", "Smith"]
})
I want to transform this DataFrame into a new column that contains dictionaries, where name is the key and surname is the value. The expected outcome should look like this:
shape: (3, 3)
┌───────┬─────────┬──────────────────────────┐
│ name │ surname │ name_surname │
│ --- │ --- │ --- │
│ str │ str │ dict[str, str] │
├───────┼─────────┼──────────────────────────┤
│ Chuck │ Dalliston│ {"Chuck": "Dalliston"} │
│ John │ Doe │ {"John": "Doe"} │
│ Alice │ Smith │ {"Alice": "Smith"} │
└───────┴─────────┴──────────────────────────┘
I've tried the following code:
df.with_columns(
json = pl.struct("name", "surname").map_elements(json.dumps)
)
But the result is not as expected. Instead of creating a dictionary with key-value
, it produces:
{name:Chuck,surname:Dalliston}
You can try this code snippet, This seems to be the closest you can get has pl does not have a naive dict.
See reference : data_types_polaris
import polars as pl
df = pl.DataFrame(
{"name": ["Chuck", "John", "Alice"], "surname": ["Dalliston", "Doe", "Smith"]}
)
df = df.select(
[
"name",
"surname",
(
pl.struct(["name", "surname"]).map_elements(
lambda row: {row["name"]: row["surname"]}, return_dtype=pl.Object
)
).alias("name_surname"),
]
)
print(df)
┌───────┬───────────┬────────────────────────┐
│ name ┆ surname ┆ name_surname │
│ --- ┆ --- ┆ --- │
│ str ┆ str ┆ object │
╞═══════╪═══════════╪════════════════════════╡
│ Chuck ┆ Dalliston ┆ {'Chuck': 'Dalliston'} │
│ John ┆ Doe ┆ {'John': 'Doe'} │
│ Alice ┆ Smith ┆ {'Alice': 'Smith'} │
└───────┴───────────┴────────────────────────┘