I have the following code.
import polars as pl
class Summary:
def __init__(self, value: float, origin: str):
self.value = value
self.origin = origin
def __repr__(self) -> str:
return f'Summary({self.value},{self.origin})'
def __mul__(self, x: float | int) -> 'Summary':
return Summary(self.value * x, self.origin)
def __rmul__(self, x: float | int) -> 'Summary':
return self * x
mapping = {
'CASH': Summary( 1, 'E'),
'ITEM': Summary(-9, 'A'),
'CHECK': Summary(46, 'A'),
}
df = pl.DataFrame({'quantity': [7, 4, 10], 'type': mapping.keys(), 'summary': mapping.values()})
The dataframe df
looks as follows.
shape: (3, 3)
┌──────────┬───────┬───────────────┐
│ quantity ┆ type ┆ summary │
│ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ object │
╞══════════╪═══════╪═══════════════╡
│ 7 ┆ CASH ┆ Summary(1,E) │
│ 4 ┆ ITEM ┆ Summary(-9,A) │
│ 10 ┆ CHECK ┆ Summary(46,A) │
└──────────┴───────┴───────────────┘
Especially, the summary
column contains a Summary
class object, which supports multiplication. Now, I'd like to multiply this column with the quantity
column.
However, the naive approach raises an error.
df.with_columns(pl.col('quantity').mul(pl.col('summary')).alias('qty_summary'))
SchemaError: failed to determine supertype of i64 and object
Is there a way to multiply these columns?
Remember that Polars is designed so that computations run in Rust, not Python, where it's like 1000x faster. If you have Python operations you want to run, you lose a lot of the benefit of using Polars in the first place.
But, thankfully, Polars does have a very nice feature that is relevant here, which is “native” processing of dataclasses
.
import polars as pl
from dataclasses import dataclass
@dataclass
class Summary:
value: float
origin: str
def __mul__(self, x: float | int) -> "Summary":
return Summary(self.value * x, self.origin)
def __rmul__(self, x: float | int) -> "Summary":
return self * x
mapping = {
"CASH": Summary(1, "E"),
"ITEM": Summary(-9, "A"),
"CHECK": Summary(46, "A"),
}
df = pl.DataFrame(
{
"quantity": [7, 4, 10],
"type": mapping.keys(),
"summary": mapping.values(),
}
)
df
Because Summary
is a dataclass
, you 1. don't need __init__
and __repr__
(they come for free), and 2. don't need to do anything special for Polars to struct-ify them.
shape: (3, 3)
┌──────────┬───────┬────────────┐
│ quantity ┆ type ┆ summary │
│ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ struct[2] │
╞══════════╪═══════╪════════════╡
│ 7 ┆ CASH ┆ {1.0,"E"} │
│ 4 ┆ ITEM ┆ {-9.0,"A"} │
│ 10 ┆ CHECK ┆ {46.0,"A"} │
└──────────┴───────┴────────────┘
Now you can just do regular Polars struct ops:
df.with_columns(
qty_summary=pl.struct(
pl.col("summary").struct.field("value") * pl.col("quantity"),
pl.col("summary").struct.field("origin"),
)
)