I have a dataset, part of which looks like this:
customer | product | price | quantity | sale_time |
---|---|---|---|---|
C060235 | P0204 | 6.99 | 2 | 2024-03-11 08:24:11 |
C045298 | P0167 | 14.99 | 1 | 2024-03-11 08:35:06 |
... | ||||
C039877 | P0024 | 126.95 | 1 | 2024-09-30 21:18:45 |
What I want is a list of unique customer, product pairs with the total sales, so something like:
customer | product | total |
---|---|---|
C0000105 | P0168 | 643.78 |
C0000105 | P0204 | 76.88 |
... | ||
C1029871 | P1680 | 435.44 |
Here's my attempt at constructing this. This gives me the grand total of all sales, which isn't what I want. What's a correct approach?
import polars as pl
db.select(
(
pl.col('customer'),
pl.col('product'),
pl.col('quantity').mul(pl.col('price')).alias('total')
)
).group_by(('customer', 'product'))
To do this calculate the sale amount for each row then group by both customer and product columns, and then sum the calculated amounts within each group
Your current query has a few issues:
product
and customer
but grouping by item_lookup_key
and shopper_card_number
This approach works:
db.group_by(["customer", "product"]).agg([
((pl.col("quantity") * pl.col("price")).sum()).alias("total")
])
A more concise alternative is the expr.dot
:
db.group_by("customer", "product").agg(
total=pl.col("quantity").dot("price")
)