pythonpython-polars

Sum of products of columns in polars


I have a dataset, part of which looks like this:

customer product price quantity sale_time
C060235 P0204 6.99 2 2024-03-11 08:24:11
C045298 P0167 14.99 1 2024-03-11 08:35:06
...
C039877 P0024 126.95 1 2024-09-30 21:18:45

What I want is a list of unique customer, product pairs with the total sales, so something like:

customer product total
C0000105 P0168 643.78
C0000105 P0204 76.88
...
C1029871 P1680 435.44

Here's my attempt at constructing this. This gives me the grand total of all sales, which isn't what I want. What's a correct approach?

import polars as pl

db.select(
    (
        pl.col('customer'),
        pl.col('product'),
        pl.col('quantity').mul(pl.col('price')).alias('total')
    )
).group_by(('customer', 'product'))

Solution

  • To do this calculate the sale amount for each row then group by both customer and product columns, and then sum the calculated amounts within each group

    Your current query has a few issues:

    This approach works:

    db.group_by(["customer", "product"]).agg([
        ((pl.col("quantity") * pl.col("price")).sum()).alias("total")
    ])
    

    A more concise alternative is the expr.dot:

    db.group_by("customer", "product").agg(
        total=pl.col("quantity").dot("price")
    )