snowflake-cloud-data-platform

ARRAYS_TO_OBJECT with duplicate handling


ARRAYS_TO_OBJECT function merges two arrays into OBJECT.

Returns an OBJECT that contains the keys specified by one input ARRAY and the values specified by another input ARRAY.

ARRAYS_TO_OBJECT( <key_array> , <value_array> )

SELECT ARRAYS_TO_OBJECT(['key1', 'key2', 'key3'], [1, 2, 3]);
{"key1": 1,"key2": 2, "key3": 3}  

The challenge is that the key_array cannot contain duplicates:

SELECT ['blue', 'green', 'blue', 'red'] AS arr1,
       [1,2,3,4]                        AS arr2,
       ARRAYS_TO_OBJECT(arr1, arr2)     AS result;
-- Error:
-- Duplicate field key 'blue'

On the other hand PARSE_JSON(expr, 'd') is capable of resolving duplicates:

Parameter Description
d Allow duplicate keys in JSON objects. If a JSON object contains a duplicate key, the returned object has a single instance of that key with the last value specified for that key.
s Don’t allow duplicate keys in JSON objects (strict). This value is the default.

Seeking for SQL approach to handle merging two arrays with possible "duplicates" in the key_array.


Solution

  • ARRAYS_TO_OBJECT with duplicate handling can be emulated:

    Code:

    SELECT 
     ['blue', 'green', 'blue', 'red'] AS arr1,
     [1,2,3,4] AS arr2,
     REDUCE(ARRAYS_ZIP(arr1, arr2),{},(acc,x) -> OBJECT_INSERT(acc, x:$1::TEXT, x:$2, TRUE)) AS result;
    

    Output:

    enter image description here