The goal is to extract metric value for the newest observation. For provided data:
CREATE OR REPLACE TABLE tab (id INT, col VARIANT)
AS
SELECT 1, [{'date':'2024-01-01', 'metric':1}, {'date':'2024-01-02', 'metric':2}] UNION ALL
SELECT 2, [{'date':'2024-01-12', 'metric':3}, {'date':'2024-01-11', 'metric':4}] UNION ALL
SELECT 3, [{'date':'2024-01-05', 'metric':5}, {'date':'2024-01-04', 'metric':6}, {'date':'2024-01-07', 'metric':7}];
Desired output:
Effectively searching for a way to perform equivalent of "MIN_BY/MAX_BY" operation on an array of objects, but without flattening it first.
WITH cte AS (
SELECT tab.id, tab.col,s.value:metric::INT AS metric, s.value:date::DATE AS date
FROM tab
,TABLE(FLATTEN(tab.col)) AS s(val)
)
SELECT id, col, MAX_BY(metric, date) AS col_newest_metric
FROM cte
GROUP BY id, col
ORDER BY id;
Using REDUCE function, OBJECT accumulator and conditional logic:
SELECT id, col,
REDUCE(col,
{'date':'0000-01-01'::DATE, 'metric':NULL},
(acc, x) -> iff(x:date>acc:date, {'date':x:date, 'metric':x:metric}, acc)
):metric::INT AS col_newest_metric
FROM tab;
Output: