sqlrecursiongoogle-bigquery

How to calculate a unit price with a recursive Tree structure in a database


I’m working with a database that has two key columns:

The issue I’m facing is calculating the unit price for these articles. Sometimes, the unit price for a name_article is NULL. In such cases, I need to find its price by looking at the recipe column, where the recipe value matches the name_article. Then, I need to sum up the prices of the articles related to that name_article.

The complication is that this recipe can itself contain sub-recipes, which creates a recursive structure. I need to handle this recursion to compute the correct unit price at all levels of the sub-recipes.

Here’s an example of the data I'm working with:

recipe name_article id_establishment quantity price_unit_ht_2
X Z 2 0.3 NULL
Z W 2 0.02 2.4275
Z F 2 0.01 NULL
F A 2 0.08 4.2
F B 2 0.1 NULL
F C 2 0.2 NULL
C J 2 0.4 2.3
B K 2 0.2 1.56

For example, the unit price for Z is NULL, so I need to calculate it by summing the prices of its ingredients, W and F. However, the price for F is also NULL, which means I need to look at F's ingredients (A, B, and C). Since B and C also have NULL prices, I continue the process: B is linked to K and C is linked to J, both of which have unit prices.

In this case, the unit price for Z would be the sum of the unit prices of W, A, K, and J.

Here's the expected output: Here is the updated table with the calculated unit prices:

recipe name_article id_establishment quantity price_unit_ht_2
X Z 2 0.30 10.4875
Z W 2 0.02 2.4275
Z F 2 0.01 8.0600
F A 2 0.08 4.2000
F B 2 0.10 1.5600
F C 2 0.20 2.3000
C J 2 0.40 2.3000
B K 2 0.20 1.5600

How can I handle this recursive structure to calculate the correct unit price for each article using BQuery or SQL ?

Any help would be greatly appreciated 🙏🥺


Solution

  • Recurive CTEs are tough to wrap your head around at first. In this case we want to select all of the records in the non-recursive component (the first SELECT) and then link the name_article coming out of that result set to the recipe in the original table.

    When I build these I like to stitch together a path that shows me how the record was derived as well as a depth to know how deep I went to get there. These are superfluous for your needs, but they provide a nice map for us humans. Here I also use it to ensure there is no cycling (where a hierarchy refers back to an already included element causing an endless loop).

    Consider the following:

    WITH RECURSIVE rcte AS 
    (
      SELECT recipe as the_recipe,
             recipe as parent, 
             name_article as child,
             quantity,
             price_unit_ht_2,
             recipe || '>' || name_article as path,
             0 as depth
      FROM recipes
      UNION ALL
      SELECT
          rcte.the_recipe,
          r.recipe,
          r.name_article,
          r.quantity,
          r.price_unit_ht_2,
          rcte.path || '>' || r.name_article,
          rcte.depth + 1
      FROM rcte
          INNER JOIN recipes r 
          ON rcte.child = r.recipe
          WHERE rcte.path NOT LIKE '%' || r.name_article || '%'
    )
    SELECT the_recipe, child, quantity, price_unit_ht_2, path, depth
    FROM rcte
    WHERE price_unit_ht_2 IS NOT NULL
    ORDER BY the_recipe, child
    

    +------------+-------+----------+-----------------+-----------+-------+
    | the_recipe | child | quantity | price_unit_ht_2 |   path    | depth |
    +------------+-------+----------+-----------------+-----------+-------+
    | B          | K     |     0.20 |          1.5600 | B>K       |     0 |
    | C          | J     |     0.40 |          2.3000 | C>J       |     0 |
    | F          | A     |     0.08 |          4.2000 | F>A       |     0 |
    | F          | J     |     0.40 |          2.3000 | F>C>J     |     1 |
    | F          | K     |     0.20 |          1.5600 | F>B>K     |     1 |
    | X          | A     |     0.08 |          4.2000 | X>Z>F>A   |     2 |
    | X          | J     |     0.40 |          2.3000 | X>Z>F>C>J |     3 |
    | X          | K     |     0.20 |          1.5600 | X>Z>F>B>K |     3 |
    | X          | W     |     0.02 |          2.4275 | X>Z>W     |     1 |
    | Z          | A     |     0.08 |          4.2000 | Z>F>A     |     1 |
    | Z          | J     |     0.40 |          2.3000 | Z>F>C>J   |     2 |
    | Z          | K     |     0.20 |          1.5600 | Z>F>B>K   |     2 |
    | Z          | W     |     0.02 |          2.4275 | Z>W       |     0 |
    +------------+-------+----------+-----------------+-----------+-------+
    

    As mentioned in the comments, this shapes the data in such a way that only the base ingredients for each recipe is outputted (not subrecipes). You can change that last SELECT statement to aggregate to get total unit prices for each recipe like so:

    WITH RECURSIVE rcte AS 
    (
      SELECT recipe as the_recipe,
             recipe as parent, 
             name_article as child,
             quantity,
             price_unit_ht_2,
             recipe || '>' || name_article as path,
             0 as depth
      FROM recipes
      UNION ALL
      SELECT
          rcte.the_recipe,
          r.recipe,
          r.name_article,
          r.quantity,
          r.price_unit_ht_2,
          rcte.path || '>' || r.name_article,
          rcte.depth + 1
      FROM rcte
          INNER JOIN recipes r 
          ON rcte.child = r.recipe
          WHERE rcte.path NOT LIKE '%' || r.name_article || '%'
    )
    SELECT the_recipe, sum(quantity), sum(price_unit_ht_2)
    FROM rcte
    WHERE price_unit_ht_2 IS NOT NULL
    GROUP BY the_recipe
         
    

    +------------+------+---------+
    | the_recipe | sum  |   sum   |
    +------------+------+---------+
    | B          | 0.20 |  1.5600 |
    | C          | 0.40 |  2.3000 |
    | X          | 0.70 | 10.4875 |
    | Z          | 0.70 | 10.4875 |
    | F          | 0.68 |  8.0600 |
    +------------+------+---------+
    

    Note that the recursive CTE does not change since it's in charge of deriving the result shared in the first SQL which can then drive the rest of your analysis like this aggregation.