Suppose I have the following product sales data:
Products:
product_id | product_name | category_id |
---|---|---|
1 | Widget A | 1 |
2 | Widget B | 1 |
3 | Gizmo | 2 |
4 | Gadget | 2 |
5 | Doohickey | 3 |
Categories:
category_id | category_name |
---|---|
1 | Electronics |
2 | Toys |
3 | Home & Garden |
Sales:
sale_id | product_id | quantity | sale_date | sale_price |
---|---|---|---|---|
101 | 1 | 10 | 2024-04-01 | 19.99 |
102 | 2 | 5 | 2024-04-02 | 24.99 |
103 | 3 | 7 | 2024-04-03 | 15.99 |
104 | 4 | 4 | 2024-04-04 | 20.99 |
105 | 5 | 3 | 2024-04-05 | 29.99 |
I now want to know the sales of each product and its percentage of total sales.
I tried the following query and got the expected results, but I think the execution time is a bit long, especially when the table becomes larger, is there anything that can be optimized?(BTW, I'm using PieCloudDB Database, if you don't know it you can test with PostgreSQL instead.)
Thanks in advance for your help
WITH TotalSales AS (
SELECT SUM(sale_price * quantity) AS overall_sales
FROM sales
),
CategorySales AS (
SELECT
c.category_name,
p.product_name,
SUM(s.sale_price * s.quantity) AS category_sales
FROM sales s
JOIN products p ON s.product_id = p.product_id
JOIN categories c ON p.category_id = c.category_id
CROSS JOIN TotalSales
GROUP BY c.category_name, p.product_id, p.product_name
)
SELECT
product_name,
category_name,
category_sales,
ROUND((category_sales * 100.0) / overall_sales, 2) AS percentage_of_total
FROM CategorySales, TotalSales
ORDER BY category_sales DESC;
Performance depends on overall context including indexes, primary keys, foreign keys, relations, cascadings ...
Here is the code using analytic functions, Sum() Over(), that works both with Postgres and Oracle. Cost according to Oracle's Explain plan is 28. Your code has cost 46 and Koa's code has cost 32 which doesn't mean anyhing without you and your tests and comparisons within your actual context.
-- S Q L :
SELECT *
FROM ( Select p.PRODUCT_NAME, c.CATEGORY_NAME,
Sum(s.SALE_PRICE * s.QUANTITY) Over(Partition By p.PRODUCT_ID) as CATEGORY_SALES,
--
Round( Sum(SALE_PRICE * QUANTITY) Over(Partition By p.PRODUCT_ID) * 100 /
Sum(SALE_PRICE * QUANTITY) Over(), 2 ) as PERCENTAGE_OF_TOTAL
From products p
Left Join sales s ON s.PRODUCT_ID = p.PRODUCT_ID
Left Join categories c ON c.CATEGORY_ID = p.CATEGORY_ID
)
ORDER BY CATEGORY_SALES Desc
/* R e s u l t :
PRODUCT_NAME CATEGORY_NAME CATEGORY_SALES PERCENTAGE_OF_TOTAL
------------ ------------- -------------- -------------------
Widget A Electronics 199,9 32,73
Widget B Electronics 124,95 20,46
Gizmo Toys 111,93 18,33
Doohickey Home & Garden 89,97 14,73
Gadget Toys 83,96 13,75 */
Notes:
It is better to use Join ON syntax instead of list of tables which is deprecated for 30+ years.
I followed your code and column names. Column CATEGORY_SALES is actually PRODUCT_SALES since the aggregation is on product level.
Tests (for all 3 codes posted so far) done using CTEs instead of creating tables for there are no info about keys, indexes, relations etc...
CTEs:
WITH
products ( PRODUCT_ID, PRODUCT_NAME, CATEGORY_ID ) AS
( Select 1, 'Widget A', 1 From Dual Union All
Select 2, 'Widget B', 1 From Dual Union All
Select 3, 'Gizmo', 2 From Dual Union All
Select 4, 'Gadget', 2 From Dual Union All
Select 5, 'Doohickey', 3 From Dual
),
categories ( CATEGORY_ID, CATEGORY_NAME) AS
( Select 1, 'Electronics' From Dual Union All
Select 2, 'Toys' From Dual Union All
Select 3, 'Home ' || '&' || ' Garden' From Dual
),
sales ( SALE_ID, PRODUCT_ID, QUANTITY, SALE_DATE, SALE_PRICE ) AS
( Select 101, 1, 10, DATE '2024-04-01', 19.99 From Dual Union All
Select 102, 2, 5, DATE '2024-04-02', 24.99 From Dual Union All
Select 103, 3, 7, DATE '2024-04-03', 15.99 From Dual Union All
Select 104, 4, 4, DATE '2024-04-04', 20.99 From Dual Union All
Select 105, 5, 3, DATE '2024-04-05', 29.99 From Dual
)