sqlamazon-athenaprestotrino

Compute a new column to increment/decrement by 1 per row, in respect to a certain row set with a condition


I would like to increment/decrement by 1 per row, when the starting point (0) is set by a condition in a specific row.

Example

Consider the following my_table:

WITH my_table AS (
    SELECT *
    FROM (VALUES
        (DATE '2023-02-01', 'red'),
        (DATE '2023-03-22', 'red'),
        (DATE '2023-03-30', 'red'),
        (DATE '2023-06-10', 'red'),
        (DATE '2023-06-11', 'red'),
        (DATE '2023-07-03', 'green'),
        (DATE '2023-07-09', 'green'),
        (DATE '2024-01-11', 'green'),
        (DATE '2024-02-11', 'yellow'), -- <~~~ I want this row to be my reference 
        (DATE '2024-02-12', 'yellow'),
        (DATE '2024-02-13', 'yellow'),
        (DATE '2024-02-14', 'yellow'),
        (DATE '2022-10-20', 'blue'),
        (DATE '2022-10-21', 'blue'),
        (DATE '2022-10-22', 'blue')
    ) AS t(my_date, color)
)

I want to create a new column called relative_idx, that will have the value of 0 where my_date = DATE('2024-02-11') AND color = 'yellow'. The rest of the records in relative_idx will have a value in relation to the 0.

Right now, I only know how to do:

SELECT my_date,
       color,
       IF(my_date = DATE('2024-02-11') AND color = 'yellow', 0, NULL) AS relative_idx
FROM my_table; 

-- my_date   |color |relative_idx|
-- ----------+------+------------+
-- 2023-02-01|red   |        NULL|
-- 2023-03-22|red   |        NULL|
-- 2023-03-30|red   |        NULL|
-- 2023-06-10|red   |        NULL|
-- 2023-06-11|red   |        NULL|
-- 2023-07-03|green |        NULL|
-- 2023-07-09|green |        NULL|
-- 2024-01-11|green |        NULL|
-- 2024-02-11|yellow|           0|
-- 2024-02-12|yellow|        NULL|
-- 2024-02-13|yellow|        NULL|
-- 2024-02-14|yellow|        NULL|
-- 2022-10-20|blue  |        NULL|
-- 2022-10-21|blue  |        NULL|
-- 2022-10-22|blue  |        NULL|

But my desired output is:

my_date color relative_idx
2023-02-01 red -8
2023-03-22 red -7
2023-03-30 red -6
2023-06-10 red -5
2023-06-11 red -4
2023-07-03 green -3
2023-07-09 green -2
2024-01-11 green -1
2024-02-11 yellow 0
2024-02-12 yellow 1
2024-02-13 yellow 2
2024-02-14 yellow 3
2022-10-20 blue 4
2022-10-21 blue 5
2022-10-22 blue 6

I use AWS Athena which relies on Trino SQL.


Solution

  • WITH my_table AS (
        SELECT *
        FROM (VALUES
            (DATE '2023-02-01', 'red'),
            (DATE '2023-03-22', 'red'),
            (DATE '2023-03-30', 'red'),
            (DATE '2023-06-10', 'red'),
            (DATE '2023-06-11', 'red'),
            (DATE '2023-07-03', 'green'),
            (DATE '2023-07-09', 'green'),
            (DATE '2024-01-11', 'green'),
            (DATE '2024-02-11', 'yellow'),
            (DATE '2024-02-12', 'yellow'),
            (DATE '2024-02-13', 'yellow'),
            (DATE '2024-02-14', 'yellow'),
            (DATE '2022-10-20', 'blue'),
            (DATE '2022-10-21', 'blue'),
            (DATE '2022-10-22', 'blue')
        ) AS t(my_date, color)
    ),
        my_table_with_idx AS (
            SELECT *, row_number() OVER() AS relative_idx
            FROM my_table
    ),  
        reference_row AS (
            SELECT relative_idx
            FROM my_table_with_idx
            WHERE my_date = date '2024-02-11'
            AND color = 'yellow'
    )
    SELECT *
         , relative_idx - (SELECT * FROM reference_row) AS relative_idx
    FROM my_table_with_idx