Currently, I have code like this:
WITH
foo AS (
SELECT 1 AS a, "hello" AS b, NULL AS c, 10 AS some_timestamp
UNION ALL
SELECT 1, NULL, "world", 11
UNION ALL
SELECT 2, "hi", "!", 99
)
SELECT
a,
ANY_VALUE(b HAVING MAX if(b is null, null, some_timestamp)) AS latest_b,
ANY_VALUE(c HAVING MAX if(c is null, null, some_timestamp)) AS latest_c
FROM
foo
GROUP BY
a
The lines with ANY_VALUE
are quite repetitive. Is there a way to abstract this logic away, maybe with a UDF?
In the end, it would be awesome if I could replace
ANY_VALUE(b HAVING MAX if(b is null, null, some_timestamp)) AS latest_b,
ANY_VALUE(c HAVING MAX if(c is null, null, some_timestamp)) AS latest_c
with something like that:
latest_non_null(b, some_timestamp) AS latest_b,
latest_non_null(c, some_timestamp) AS latest_c
You can create a custom user-defined aggregation function (UDAFs), currently in preview:
CREATE OR REPLACE AGGREGATE FUNCTION `project.dataset.latest_non_null`(
value STRING,
some_timestamp INTEGER)
RETURNS STRING
AS (
ANY_VALUE(value HAVING MAX if(value is null, null, some_timestamp))
);
SELECT
a,
project.dataset.latest_non_null(b, some_timestamp) AS latest_b,
project.dataset.latest_non_null(c, some_timestamp) AS latest_c,
FROM
foo
GROUP BY
a
Output: