sqlpostgresqlrecursion

How does a Recursive CTE know to only process records added during the last iteration (and not looking at all of them)?


Consider this [v good] example from Using recursive CTE to generate a range of data

WITH RECURSIVE date_range AS (
  SELECT '2023-01-01'::timestamp  AS date
  UNION ALL
  SELECT date + interval '1 day'
  FROM date_range
  WHERE date < '2023-12-31'
)
SELECT *
FROM date_range;

I can understand how to create this query, but do not understand the underlying logic/rules executed by the database engine. Specifically: for the select date + 'interval 1 day' from date_range: how does it know to only "consider" the very last added record [instead of taking all the previously added records and adding 1 to each of them]?

To illustrate more clearly: consider when we are at 2023-01-03 in the recursion. In that case we already presumably have 2023-01-01, 2023-01-02, and 2023-01-03 in the table. Why does the second query shown below not "find" three records and add 1 to all of those 3 records?

  SELECT date + interval '1 day'
  FROM date_range
  WHERE date < '2023-12-31'

Instead what is happening is the engine only "finds" the very last record. 2023-01-03 and adds one to that to generate the single new record 2023-01-04? Is that just something I take on "faith" for recursive views?


Solution

  • consider when we are at 2023-01-03 in the recursion. In that case we already presumably have 2023-01-01, 2023-01-02, and 2023-01-03 in the table. Why does the second query shown below not "find" three records and add 1 to all of those 3 records

    Because there are 3 different tables (tuplestores) involved. Results keep getting accumulated and that's where the 2023-01-01, 2023-01-02, and 2023-01-03 live - the query doesn't target that. Instead it targets work_table which gets wiped and replaced with contents of intermediate_table from previous iteration - at that point it only holds 2023-01-03:

    iteration work_table intermediate_table results
    1 ('2023-01-01')
    non-recursive term
    ('2023-01-01'+interval '1 day') ('2023-01-01'),
    ('2023-01-02')
    2 ('2023-01-02')
    previous values from intermediate
    ('2023-01-02'+interval '1 day') ('2023-01-01'),
    ('2023-01-02'),
    ('2023-01-03')
    3 ('2023-01-03')
    previous values from intermediate
    ('2023-01-03'+interval '1 day') ('2023-01-01'),
    ('2023-01-02'),
    ('2023-01-03'),
    ('2023-01-04')

    Behaviour above illustrates the description from the doc:

    1. Evaluate the non-recursive term. (...) Include all remaining rows in the result of the recursive query, and also place them in a temporary working table.

    2. So long as the working table is not empty, repeat these steps:

      a) Evaluate the recursive term, substituting the current contents of the working table for the recursive self-reference. (...) Include all remaining rows in the result of the recursive query, and also place them in a temporary intermediate table.

      b) Replace the contents of the working table with the contents of the intermediate table, then empty the intermediate table

    The same description in src/backend/executor/nodeRecursiveunion.c responsible for the feature, sounds a bit more cryptic:

     * 1. evaluate non recursive term and assign the result to RT    //ok
     *
     * 2. execute recursive terms                                    //ok
     *
     * 2.1 WT := RT                                                  //ok
     * 2.2 while WT is not empty repeat 2.3 to 2.6. if WT is empty returns RT
     * 2.3 replace the name of recursive term with WT
     * 2.4 evaluate the recursive term and store into WT
     * 2.5 append WT to RT
     * 2.6 go back to 2.2
    

    But the code really does what it said on the tin, how it said it. Here's the 2.2, 2.3, 2.6 part, even matching the names from the doc:

        /* 2. Execute recursive term */
        for (;;)
        {
            slot = ExecProcNode(innerPlan);
            if (TupIsNull(slot))
            {
                /* Done if there's nothing in the intermediate table */
                if (node->intermediate_empty)
                    break;
    
                /* done with old working table ... */
                tuplestore_end(node->working_table);
    
                /* intermediate table becomes working table */
                node->working_table = node->intermediate_table;
    
                /* create new empty intermediate table */
                node->intermediate_table = tuplestore_begin_heap(false, false,
                                                                 work_mem);
                node->intermediate_empty = true;
    
                /* reset the recursive term */
                innerPlan->chgParam = bms_add_member(innerPlan->chgParam,
                                                     plan->wtParam);
    
                /* and continue fetching from recursive term */
                continue;
            }
    

    If you think about it, the "taking all the previously added records and adding 1 to each of them" thing is exactly what it does. All added previously, as in previous iteration, not all processed since the beginning of operation. In your case, it started with a single row, so in each iteration all previously added records amount to 1. If you started with 2, you'd get 2 at a time, for as long as both match the WHERE condition.