I have a dict
and a polars DataFrame
and want to map one column to the values of the dict:
import polars as pl
df = pl.DataFrame({
'label': ['AA', 'BB', 'AA', 'CC'],
'type': ['CASH', 'ITEM', 'CHECK', 'CHECK'],
})
mapping = {
'CASH': {'qty': 1, 'origin': 'E'},
'ITEM': {'qty': -9, 'origin': 'A'},
'CHECK': {'qty': 46, 'origin': 'A'},
}
df.with_columns(pl.col('type').replace_strict(mapping).alias('mapped'))
This outputs
shape: (4, 3)
┌───────┬───────┬───────────┐
│ label ┆ type ┆ mapped │
│ --- ┆ --- ┆ --- │
│ str ┆ str ┆ struct[2] │
╞═══════╪═══════╪═══════════╡
│ AA ┆ CASH ┆ {1,"E"} │
│ BB ┆ ITEM ┆ {-9,"A"} │
│ AA ┆ CHECK ┆ {46,"A"} │
│ CC ┆ CHECK ┆ {46,"A"} │
└───────┴───────┴───────────┘
The problem is, it only takes the values of the dict and entirely drops the keys.
So I tried using replace_strict(mapping, return_dtype=pl.Object)
, but this gives error
File site-packages\polars\lazyframe\frame.py:2026,
in LazyFrame.collect(self, type_coercion, predicate_pushdown, projection_pushdown,
simplify_expression, slice_pushdown, comm_subplan_elim,
comm_subexpr_elim, cluster_with_columns, no_optimization,
streaming, engine, background, _eager, **_kwargs)
2024 # Only for testing purposes
2025 callback = _kwargs.get("post_opt_callback", callback)
-> 2026 return wrap_df(ldf.collect(callback))
InvalidOperationError: casting from Int64 to Unknown not supported
Ultimately, the output I am after is the table below. How do I achieve this?
shape: (4, 3)
┌───────┬───────┬─────────────────────────┐
│ label ┆ type ┆ mapped │
│ --- ┆ --- ┆ --- │
│ str ┆ str ┆ object │
╞═══════╪═══════╪═════════════════════════╡
│ AA ┆ CASH ┆ {"qty":1,"origin":"E"} │
│ BB ┆ ITEM ┆ {"qty":-9,"origin":"A"} │
│ AA ┆ CHECK ┆ {"qty":46,"origin":"A"} │
│ CC ┆ CHECK ┆ {"qty":46,"origin":"A"} │
└───────┴───────┴─────────────────────────┘
I am using polars==1.3.0
Just to be clear, polars doesn't drop the keys, it just implicitly converts python dictionaries to Struct
You can easily check it by using, for example, unnest()
:
res = df.with_columns(pl.col('type').replace_strict(mapping).alias('mapped'))
┌───────┬───────┬───────────┐
│ label ┆ type ┆ mapped │
│ --- ┆ --- ┆ --- │
│ str ┆ str ┆ struct[2] │
╞═══════╪═══════╪═══════════╡
│ AA ┆ CASH ┆ {1,"E"} │
│ BB ┆ ITEM ┆ {-9,"A"} │
│ AA ┆ CHECK ┆ {46,"A"} │
│ CC ┆ CHECK ┆ {46,"A"} │
└───────┴───────┴───────────┘
res.unnest('mapped')
┌───────┬───────┬─────┬────────┐
│ label ┆ type ┆ qty ┆ origin │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ str ┆ i64 ┆ str │
╞═══════╪═══════╪═════╪════════╡
│ AA ┆ CASH ┆ 1 ┆ E │
│ BB ┆ ITEM ┆ -9 ┆ A │
│ AA ┆ CHECK ┆ 46 ┆ A │
│ CC ┆ CHECK ┆ 46 ┆ A │
└───────┴───────┴─────┴────────┘
All the key names are there, you just don't see it in the text representation. So question is - do you actually need data in the column to be python dictionaries? It'd be much easier to work with native polars struct.