sqloracle-database

How Is a Recursive CTE Internally Evaluated in the 3-Topping Pizzas SQL Query?


I was practicing SQL questions on Datalemur and came across this interesting problem.

Here’s the table structure and sample data I’m working with:

-- Create the table
CREATE TABLE pizza_toppings 
(
    topping_name VARCHAR2(255),
    ingredient_cost DECIMAL(10, 2)
);

-- Insert sample data
INSERT INTO pizza_toppings (topping_name, ingredient_cost) 
VALUES ('Pepperoni', 0.50);
INSERT INTO pizza_toppings (topping_name, ingredient_cost) 
VALUES ('Sausage', 0.70);
INSERT INTO pizza_toppings (topping_name, ingredient_cost) 
VALUES ('Chicken', 0.55);
INSERT INTO pizza_toppings (topping_name, ingredient_cost) 
VALUES ('Extra Cheese', 0.40);

-- Commit the changes
COMMIT;

Table

I noticed that many people solve this using CROSS JOIN, and I understand that approach.

However, I came across the following recursive CTE solution, which I'm finding difficult to fully understand — particularly how the recursion works internally, step by step.

Could someone please help break down this recursive query and explain how it's building the combinations ?

Here is the query:

-- Generate all unique 3-topping combinations with total cost
WITH topping_list AS 
(
    SELECT topping_name, ingredient_cost
    FROM pizza_toppings
),
combo_cte (pizza, total_cost, last_topping) AS 
(
    -- Anchor member: start with single topping
    SELECT 
        topping_name AS pizza,
        ingredient_cost AS total_cost,
        topping_name AS last_topping
    FROM 
        topping_list

    UNION ALL

    -- Recursive member: build combinations by adding toppings in alphabetical order
    SELECT 
        c.pizza || ',' || t.topping_name AS pizza,
        c.total_cost + t.ingredient_cost AS total_cost,
        t.topping_name AS last_topping
    FROM 
        combo_cte c
    JOIN 
        topping_list t ON t.topping_name > c.last_topping  -- ensures unique combinations only
)
-- Filter only 3-topping combos and show total cost
SELECT 
    pizza,
    ROUND(total_cost, 2) AS total_cost
FROM 
    (SELECT 
         pizza,
         total_cost,
         LENGTH(pizza) - LENGTH(REPLACE(pizza, ',', '')) + 1 AS topping_count
     FROM 
         combo_cte
    )
WHERE 
    topping_count = 3
ORDER BY 
    total_cost DESC;

Solution

  • Given your query:

    WITH topping_list AS 
    (
        SELECT topping_name, ingredient_cost
        FROM pizza_toppings
    ),
    

    Then the first sub-query factoring clause is redundant as you are just creating an in-line view with the same columns and a different name. Similarly, generating the recursive query and then using post-processing to count the number of toppings is generating extra work and could be simplified by moving the counting into the recursive query. Which means the query can be simplified to something that is easier to explain:

    WITH combo_cte (pizza, total_cost, last_topping, topping_count) AS 
    (
      -- Anchor member: start with single topping
      SELECT topping_name AS pizza,
             ingredient_cost AS total_cost,
             topping_name AS last_topping,
             1 AS topping_count
      FROM   pizza_toppings
    UNION ALL
      -- Recursive member: build combinations by adding toppings in alphabetical order
      SELECT c.pizza || ',' || t.topping_name AS pizza,
             c.total_cost + t.ingredient_cost AS total_cost,
             t.topping_name AS last_topping,
             1 + c.topping_count AS topping_count
      FROM   combo_cte c
             INNER JOIN pizza_toppings t
             ON t.topping_name > c.last_topping  -- ensures unique combinations only
      -- Stop counting when the desired number of toppings is reached.
      WHERE  c.topping_count < 3
    )
    -- Filter only 3-topping combos and show total cost
    SELECT pizza,
           total_cost
    FROM   combo_cte
    WHERE  topping_count = 3
    ORDER BY total_cost DESC;
    

    To show how it works, instead of the final SELECT, we can retrieve all the rows using

    WITH combo_cte (pizza, total_cost, last_topping, topping_count) AS 
    (
       ...
    )
    SELECT *
    FROM   combo_cte
    

    Which for the "anchor member" part generates:

    PIZZA TOTAL_COST LAST_TOPPING TOPPING_COUNT
    Pepperoni .5 Pepperoni 1
    Sausage .7 Sausage 1
    Chicken .55 Chicken 1
    Extra Cheese .4 Extra Cheese 1

    Creating a single row for each ingredient and setting the last topping to the same value and having a single topping count.

    Then on the second, recursive pass, the "recursive member" part takes each row of the previous iteration and joins the ingredient table to it, if the new ingredient is later in the alphabet than the previous, generating:

    PIZZA TOTAL_COST LAST_TOPPING TOPPING_COUNT
    Pepperoni,Sausage 1.2 Sausage 2
    Chicken,Pepperoni 1.05 Pepperoni 2
    Chicken,Sausage 1.25 Sausage 2
    Chicken,Extra Cheese .95 Extra Cheese 2
    Extra Cheese,Pepperoni .9 Pepperoni 2
    Extra Cheese,Sausage 1.1 Sausage 2

    So for the pervious Pepperoni row, there is only a single ingredient, Sausage, later in the alphabet so that is joined generating a new row. For the Sausage row for the anchor members, there are no ingredients with later names so nothing is generated. Similarly, Chicken is combined with each of 3 ingredients and Extra Cheese with each of the 2 later ingredients.

    Then, it is repeated for a third, recursive pass. This time, it is the output of the 2nd recursive pass that is used to generate the rows:

    PIZZA TOTAL_COST LAST_TOPPING TOPPING_COUNT
    Chicken,Pepperoni,Sausage 1.75 Sausage 3
    Chicken,Extra Cheese,Pepperoni 1.45 Pepperoni 3
    Chicken,Extra Cheese,Sausage 1.65 Sausage 3
    Extra Cheese,Pepperoni,Sausage 1.6 Sausage 3

    Anything from the previous recursion that had the LAST_TOPPING of Sausage will not have any later ingredients to join to. The other rows (Chicken,Pepperoni, Chicken,Extra Cheese and Extra Cheese,Pepperoni) all have more ingredients that are later in the alphabet and these will generate additional rows for the third ingredient.

    The query will try to recurse for another pass but the filter WHERE c.topping_count < 3 will filter out all the rows so nothing is generated and this causes the query to stop recursing.


    If you comment out the WHERE filter then the query would continue generating toppings:

    PIZZA TOTAL_COST LAST_TOPPING TOPPING_COUNT
    Chicken,Extra Cheese,Pepperoni,Sausage 2.15 Sausage 4

    Then all combinations are generated and when it tried to generate a combinations for 5 ingredients there are no ingredients that are later in the alphabet than Sausage so nothing could be joined and the recursion will halt.

    But, since we only want 3 toppings then it is more efficient to use the WHERE filter to stop the recursion at the appropriate point.


    One you have that then all your full query is doing is filtering to those pizzas that have exactly 3 ingredients and ordering by cost, giving the final output of:

    PIZZA TOTAL_COST
    Chicken,Pepperoni,Sausage 1.75
    Chicken,Extra Cheese,Sausage 1.65
    Extra Cheese,Pepperoni,Sausage 1.6
    Chicken,Extra Cheese,Pepperoni 1.45

    fiddle