sqlpostgresqlgaps-and-islands

Compute column value based on previous row recursively


Given the following scheme and values I want to compute the value of each row starting by adding 1 to the previous row value. If the row value is null then set null. If there is a value then add 1 from the previous value.

How can I achieve this in Postgres? I've tried recursive CTE but without success.

Input data:

symbol  date                    close   computed
TBBB    2024-02-22 14:30:00+00  19.05   
TBBB    2024-02-23 14:30:00+00  19.55   
TBBB    2024-02-24 14:30:00+00  20.6    
TBBB    2024-02-25 14:30:00+00  21.3    
TBBB    2024-02-26 14:30:00+00  20.43   20.4
TBBB    2024-02-27 14:30:00+00  20.21   
TBBB    2024-02-28 14:30:00+00  20.74   
TBBB    2024-02-29 14:30:00+00  20.09   
TBBB    2024-03-01 14:30:00+00  20.79   
TBBB    2024-03-02 14:30:00+00  20.87   
TBBB    2024-03-03 14:30:00+00  20.69   
TBBB    2024-03-04 14:30:00+00  20.19   
TBBB    2024-03-05 14:30:00+00  20.9    
TBBB    2024-03-06 14:30:00+00  20.99   
TBBB    2024-03-07 14:30:00+00  21.28   
TBBB    2024-03-08 14:30:00+00  21.27   

Expected result:

symbol  date                    close   computed
TBBB    2024-02-22 14:30:00+00  19.05   
TBBB    2024-02-23 14:30:00+00  19.55   
TBBB    2024-02-24 14:30:00+00  20.6    
TBBB    2024-02-25 14:30:00+00  21.3    
TBBB    2024-02-26 14:30:00+00  20.43   20.4
TBBB    2024-02-27 14:30:00+00  20.21   21.4
TBBB    2024-02-28 14:30:00+00  20.74   22.4
TBBB    2024-02-29 14:30:00+00  20.09   23.4
TBBB    2024-03-01 14:30:00+00  20.79   24.4
TBBB    2024-03-02 14:30:00+00  20.87   25.4
TBBB    2024-03-03 14:30:00+00  20.69   26.4
TBBB    2024-03-04 14:30:00+00  20.19   27.4
TBBB    2024-03-05 14:30:00+00  20.9    28.4
TBBB    2024-03-06 14:30:00+00  20.99   29.4
TBBB    2024-03-07 14:30:00+00  21.28   30.4
TBBB    2024-03-08 14:30:00+00  21.27   31.4

The +1 from previous value is just for simplicity. In reality I would have a math formula there.

What I've tried:

Some simpler example and not related to the question: here I calculate the computed column based on the close. This was a initial example, it's not what I want:

WITH computed_values AS (SELECT symbol,
                           date,
                           close,
                           CASE
                               WHEN LAG(computed, 5) OVER (PARTITION BY symbol ORDER BY date) IS NULL THEN null
                               ELSE (close * (2.0 / (period + 1))) +
                                    (LAG(computed, 5) OVER (PARTITION BY symbol ORDER BY date) * (1 - (2.0 / (period + 1))))
                               END AS computed
                    FROM (SELECT symbol,
                                 date,
                                 close,
                                 close AS computed,
                                 5     AS period
                          FROM my_table
                          WHERE symbol = 'TBBB' 
                          ORDER BY date) subquery)
SELECT *
FROM computed_values;

Generated by AI and not working:

WITH NumberedRows AS (
    SELECT 
        symbol, 
        date, 
        close,
        ROW_NUMBER() OVER (ORDER BY date ASC) AS row_num
    FROM my_table
),
KnownClose AS (
    SELECT 
        symbol, 
        date, 
        close,
        row_num
    FROM NumberedRows
    WHERE close IS NOT NULL
),
RecursiveClose AS (
    SELECT 
        symbol,
        date,
        close,
        row_num
    FROM KnownClose
    UNION ALL
    SELECT
        nr.symbol,
        nr.date,
        rc.close - 2 AS close,
        nr.row_num
    FROM RecursiveClose rc
    JOIN NumberedRows nr ON nr.row_num = rc.row_num - 1
    WHERE nr.close IS NULL
    
    UNION ALL

    SELECT
        nr.symbol,
        nr.date,
        rc.close + 2 AS close,
        nr.row_num
    FROM RecursiveClose rc
    JOIN NumberedRows nr ON nr.row_num = rc.row_num + 1
    WHERE nr.close IS NULL
) SELECT symbol,
    date,
    close
FROM RecursiveClose
ORDER BY row_num;

Solution

  • This is one of the rare cases that lends itself to a procedural solution.
    Replace all instances of tbl with your actual (schema-qualified) table name:

    -- create function
    CREATE OR REPLACE FUNCTION f_compute()
      RETURNS SETOF tbl
      LANGUAGE plpgsql AS
    $func$
    DECLARE
       r  tbl; -- use table type as row variable
       r0 tbl;
    BEGIN
       FOR r IN
          SELECT * FROM tbl t ORDER BY t.symbol, t.date
       LOOP
          IF r.symbol = r0.symbol            -- same symbol
          AND r0.computed IS NOT NULL        -- last computed exists
          AND r.computed IS NULL THEN        -- this computed is null
             r.computed := r0.computed + 1;  -- your computation here
          END IF;
          RETURN NEXT r;
          r0 := r;                           -- remember last row
       END LOOP;
    END
    $func$;
    

    Call:

    SELECT * FROM f_compute();
    

    fiddle

    Produces exactly your desired result. Compare performance with EXPLAIN ANALYZE.

    Related:

    If you don't want to persist a function, consider a temporary function. See: