I was practicing SQL questions on Datalemur and came across this interesting problem.
Here’s the table structure and sample data I’m working with:
-- Create the table
CREATE TABLE pizza_toppings
(
topping_name VARCHAR2(255),
ingredient_cost DECIMAL(10, 2)
);
-- Insert sample data
INSERT INTO pizza_toppings (topping_name, ingredient_cost)
VALUES ('Pepperoni', 0.50);
INSERT INTO pizza_toppings (topping_name, ingredient_cost)
VALUES ('Sausage', 0.70);
INSERT INTO pizza_toppings (topping_name, ingredient_cost)
VALUES ('Chicken', 0.55);
INSERT INTO pizza_toppings (topping_name, ingredient_cost)
VALUES ('Extra Cheese', 0.40);
-- Commit the changes
COMMIT;
I noticed that many people solve this using CROSS JOIN
, and I understand that approach.
However, I came across the following recursive CTE solution, which I'm finding difficult to fully understand — particularly how the recursion works internally, step by step.
Could someone please help break down this recursive query and explain how it's building the combinations ?
Here is the query:
-- Generate all unique 3-topping combinations with total cost
WITH topping_list AS
(
SELECT topping_name, ingredient_cost
FROM pizza_toppings
),
combo_cte (pizza, total_cost, last_topping) AS
(
-- Anchor member: start with single topping
SELECT
topping_name AS pizza,
ingredient_cost AS total_cost,
topping_name AS last_topping
FROM
topping_list
UNION ALL
-- Recursive member: build combinations by adding toppings in alphabetical order
SELECT
c.pizza || ',' || t.topping_name AS pizza,
c.total_cost + t.ingredient_cost AS total_cost,
t.topping_name AS last_topping
FROM
combo_cte c
JOIN
topping_list t ON t.topping_name > c.last_topping -- ensures unique combinations only
)
-- Filter only 3-topping combos and show total cost
SELECT
pizza,
ROUND(total_cost, 2) AS total_cost
FROM
(SELECT
pizza,
total_cost,
LENGTH(pizza) - LENGTH(REPLACE(pizza, ',', '')) + 1 AS topping_count
FROM
combo_cte
)
WHERE
topping_count = 3
ORDER BY
total_cost DESC;
Given your query:
WITH topping_list AS
(
SELECT topping_name, ingredient_cost
FROM pizza_toppings
),
Then the first sub-query factoring clause is redundant as you are just creating an in-line view with the same columns and a different name. Similarly, generating the recursive query and then using post-processing to count the number of toppings is generating extra work and could be simplified by moving the counting into the recursive query. Which means the query can be simplified to something that is easier to explain:
WITH combo_cte (pizza, total_cost, last_topping, topping_count) AS
(
-- Anchor member: start with single topping
SELECT topping_name AS pizza,
ingredient_cost AS total_cost,
topping_name AS last_topping,
1 AS topping_count
FROM pizza_toppings
UNION ALL
-- Recursive member: build combinations by adding toppings in alphabetical order
SELECT c.pizza || ',' || t.topping_name AS pizza,
c.total_cost + t.ingredient_cost AS total_cost,
t.topping_name AS last_topping,
1 + c.topping_count AS topping_count
FROM combo_cte c
INNER JOIN pizza_toppings t
ON t.topping_name > c.last_topping -- ensures unique combinations only
-- Stop counting when the desired number of toppings is reached.
WHERE c.topping_count < 3
)
-- Filter only 3-topping combos and show total cost
SELECT pizza,
total_cost
FROM combo_cte
WHERE topping_count = 3
ORDER BY total_cost DESC;
To show how it works, instead of the final SELECT
, we can retrieve all the rows using
WITH combo_cte (pizza, total_cost, last_topping, topping_count) AS
(
...
)
SELECT *
FROM combo_cte
Which for the "anchor member" part generates:
PIZZA | TOTAL_COST | LAST_TOPPING | TOPPING_COUNT |
---|---|---|---|
Pepperoni | .5 | Pepperoni | 1 |
Sausage | .7 | Sausage | 1 |
Chicken | .55 | Chicken | 1 |
Extra Cheese | .4 | Extra Cheese | 1 |
Creating a single row for each ingredient and setting the last topping to the same value and having a single topping count.
Then on the second, recursive pass, the "recursive member" part takes each row of the previous iteration and joins the ingredient table to it, if the new ingredient is later in the alphabet than the previous, generating:
PIZZA | TOTAL_COST | LAST_TOPPING | TOPPING_COUNT |
---|---|---|---|
Pepperoni,Sausage | 1.2 | Sausage | 2 |
Chicken,Pepperoni | 1.05 | Pepperoni | 2 |
Chicken,Sausage | 1.25 | Sausage | 2 |
Chicken,Extra Cheese | .95 | Extra Cheese | 2 |
Extra Cheese,Pepperoni | .9 | Pepperoni | 2 |
Extra Cheese,Sausage | 1.1 | Sausage | 2 |
So for the pervious Pepperoni
row, there is only a single ingredient, Sausage
, later in the alphabet so that is joined generating a new row. For the Sausage
row for the anchor members, there are no ingredients with later names so nothing is generated. Similarly, Chicken
is combined with each of 3 ingredients and Extra Cheese
with each of the 2 later ingredients.
Then, it is repeated for a third, recursive pass. This time, it is the output of the 2nd recursive pass that is used to generate the rows:
PIZZA | TOTAL_COST | LAST_TOPPING | TOPPING_COUNT |
---|---|---|---|
Chicken,Pepperoni,Sausage | 1.75 | Sausage | 3 |
Chicken,Extra Cheese,Pepperoni | 1.45 | Pepperoni | 3 |
Chicken,Extra Cheese,Sausage | 1.65 | Sausage | 3 |
Extra Cheese,Pepperoni,Sausage | 1.6 | Sausage | 3 |
Anything from the previous recursion that had the LAST_TOPPING
of Sausage
will not have any later ingredients to join to. The other rows (Chicken,Pepperoni
, Chicken,Extra Cheese
and Extra Cheese,Pepperoni
) all have more ingredients that are later in the alphabet and these will generate additional rows for the third ingredient.
The query will try to recurse for another pass but the filter WHERE c.topping_count < 3
will filter out all the rows so nothing is generated and this causes the query to stop recursing.
If you comment out the WHERE
filter then the query would continue generating toppings:
PIZZA | TOTAL_COST | LAST_TOPPING | TOPPING_COUNT |
---|---|---|---|
Chicken,Extra Cheese,Pepperoni,Sausage | 2.15 | Sausage | 4 |
Then all combinations are generated and when it tried to generate a combinations for 5 ingredients there are no ingredients that are later in the alphabet than Sausage
so nothing could be joined and the recursion will halt.
But, since we only want 3 toppings then it is more efficient to use the WHERE
filter to stop the recursion at the appropriate point.
One you have that then all your full query is doing is filtering to those pizzas that have exactly 3 ingredients and ordering by cost, giving the final output of:
PIZZA | TOTAL_COST |
---|---|
Chicken,Pepperoni,Sausage | 1.75 |
Chicken,Extra Cheese,Sausage | 1.65 |
Extra Cheese,Pepperoni,Sausage | 1.6 |
Chicken,Extra Cheese,Pepperoni | 1.45 |