snowflake-cloud-data-platform

Snowflake - row level MIN_BY/MAX BY on a semi-structuctured data type


The goal is to extract metric value for the newest observation. For provided data:

CREATE OR REPLACE TABLE tab (id INT, col VARIANT)
AS
SELECT 1, [{'date':'2024-01-01', 'metric':1}, {'date':'2024-01-02', 'metric':2}] UNION ALL
SELECT 2, [{'date':'2024-01-12', 'metric':3}, {'date':'2024-01-11', 'metric':4}] UNION ALL
SELECT 3, [{'date':'2024-01-05', 'metric':5}, {'date':'2024-01-04', 'metric':6}, {'date':'2024-01-07', 'metric':7}];

Desired output:

enter image description here

Effectively searching for a way to perform equivalent of "MIN_BY/MAX_BY" operation on an array of objects, but without flattening it first.

WITH cte AS (
    SELECT tab.id, tab.col,s.value:metric::INT AS metric, s.value:date::DATE AS date
    FROM tab
    ,TABLE(FLATTEN(tab.col)) AS s(val)
)
SELECT id, col, MAX_BY(metric, date) AS col_newest_metric
FROM cte
GROUP BY id, col
ORDER BY id;

Solution

  • Using REDUCE function, OBJECT accumulator and conditional logic:

    SELECT id, col,
        REDUCE(col, 
              {'date':'0000-01-01'::DATE, 'metric':NULL}, 
              (acc, x) -> iff(x:date>acc:date, {'date':x:date, 'metric':x:metric}, acc)
        ):metric::INT AS col_newest_metric
    FROM tab;
    

    Output:

    Desired Output