sqlquery-optimization

Sales data analysis: calculate percentage of each product as per total sales


Suppose I have the following product sales data:

Products:

product_id product_name category_id
1 Widget A 1
2 Widget B 1
3 Gizmo 2
4 Gadget 2
5 Doohickey 3

Categories:

category_id category_name
1 Electronics
2 Toys
3 Home & Garden

Sales:

sale_id product_id quantity sale_date sale_price
101 1 10 2024-04-01 19.99
102 2 5 2024-04-02 24.99
103 3 7 2024-04-03 15.99
104 4 4 2024-04-04 20.99
105 5 3 2024-04-05 29.99

I now want to know the sales of each product and its percentage of total sales.

I tried the following query and got the expected results, but I think the execution time is a bit long, especially when the table becomes larger, is there anything that can be optimized?(BTW, I'm using PieCloudDB Database, if you don't know it you can test with PostgreSQL instead.)

Thanks in advance for your help

WITH TotalSales AS (
    SELECT SUM(sale_price * quantity) AS overall_sales
    FROM sales
),
CategorySales AS (
    SELECT
        c.category_name,
        p.product_name,
        SUM(s.sale_price * s.quantity) AS category_sales
    FROM sales s
    JOIN products p ON s.product_id = p.product_id
    JOIN categories c ON p.category_id = c.category_id
    CROSS JOIN TotalSales
    GROUP BY c.category_name, p.product_id, p.product_name
)
SELECT
    product_name,
    category_name,
    category_sales,
    ROUND((category_sales * 100.0) / overall_sales, 2) AS percentage_of_total
FROM CategorySales, TotalSales
ORDER BY category_sales DESC;

Solution

  • Performance depends on overall context including indexes, primary keys, foreign keys, relations, cascadings ...
    Here is the code using analytic functions, Sum() Over(), that works both with Postgres and Oracle. Cost according to Oracle's Explain plan is 28. Your code has cost 46 and Koa's code has cost 32 which doesn't mean anyhing without you and your tests and comparisons within your actual context.

    --      S Q L :
    SELECT  *
    FROM    ( Select  p.PRODUCT_NAME, c.CATEGORY_NAME, 
                      Sum(s.SALE_PRICE * s.QUANTITY) Over(Partition By p.PRODUCT_ID) as CATEGORY_SALES, 
                      --
                      Round( Sum(SALE_PRICE * QUANTITY) Over(Partition By p.PRODUCT_ID) * 100 /
                             Sum(SALE_PRICE * QUANTITY) Over(), 2 ) as PERCENTAGE_OF_TOTAL
              From        products p 
              Left  Join  sales s ON s.PRODUCT_ID = p.PRODUCT_ID
              Left  Join  categories c ON c.CATEGORY_ID = p.CATEGORY_ID
            )
    ORDER BY CATEGORY_SALES Desc
    
    /*      R e s u l t :
    PRODUCT_NAME CATEGORY_NAME CATEGORY_SALES PERCENTAGE_OF_TOTAL
    ------------ ------------- -------------- -------------------
    Widget A     Electronics            199,9               32,73
    Widget B     Electronics           124,95               20,46
    Gizmo        Toys                  111,93               18,33
    Doohickey    Home & Garden          89,97               14,73
    Gadget       Toys                   83,96               13,75    */
    

    Notes:
    It is better to use Join ON syntax instead of list of tables which is deprecated for 30+ years.

    I followed your code and column names. Column CATEGORY_SALES is actually PRODUCT_SALES since the aggregation is on product level.
    Tests (for all 3 codes posted so far) done using CTEs instead of creating tables for there are no info about keys, indexes, relations etc...
    CTEs:

    WITH
        products ( PRODUCT_ID, PRODUCT_NAME, CATEGORY_ID ) AS
            ( Select 1, 'Widget A',  1 From Dual Union All 
              Select 2, 'Widget B',  1 From Dual Union All 
              Select 3, 'Gizmo',     2 From Dual Union All 
              Select 4, 'Gadget',    2 From Dual Union All 
              Select 5, 'Doohickey', 3 From Dual 
            ), 
        categories ( CATEGORY_ID, CATEGORY_NAME) AS 
            ( Select 1, 'Electronics'  From Dual Union All 
              Select 2, 'Toys' From Dual Union All 
              Select 3, 'Home ' || '&' || ' Garden' From Dual 
            ),
        sales ( SALE_ID, PRODUCT_ID, QUANTITY, SALE_DATE, SALE_PRICE ) AS
            ( Select 101, 1, 10, DATE '2024-04-01', 19.99 From Dual Union All 
              Select 102, 2,  5, DATE '2024-04-02', 24.99 From Dual Union All 
              Select 103, 3,  7, DATE '2024-04-03', 15.99 From Dual Union All 
              Select 104, 4,  4, DATE '2024-04-04', 20.99 From Dual Union All 
              Select 105, 5,  3, DATE '2024-04-05', 29.99 From Dual 
            )