sqlprestotrino

Presto: How to map arrays of different length?


Say I have arrays ['1', '2', '3'] and ['a', 'b', 'c', 'd'] and I want to map them

select map(array ['1', '2', '3'], array ['a', 'b', 'c', 'd'])

This will return an error saying that the arrays need to be of the same length.

How can I replicate the python's zip() which drops the ones without a pair? Or if not, pad the missing ones with NULLs?


Solution

  • You can use slice and cardinality to "fix" sizes:

    WITH dataset AS (
      SELECT * 
      FROM 
        (
          VALUES 
            (ARRAY [1, 2, 3], ARRAY[1, 2, 3, 4])
        ) AS t (arr1, arr2)
    ) 
    
    SELECT 
      map (
        slice(arr1, 1, m), 
        slice(arr2, 1, m)
      ) 
    FROM 
      (
        SELECT *, LEAST(cardinality(arr1), cardinality(arr2)) as m 
        FROM 
          dataset
      )
    

    Output:

    _col0
    {1=1, 2=2, 3=3}

    Or just use zip and transform the resuting array of ROW's into map (note, this relies on default naming convention for elements of ROW, as @Martin Traverso points out in comments in Trino you can access row fields by index, so you can change corresponding line to r -> r[1] IS NOT NULL):

    WITH dataset AS (
        SELECT * FROM (VALUES   
           (ARRAY [1,2,3], ARRAY[1,2,3,4])
     ) AS t (arr1, arr2))
    
    SELECT map_from_entries(filter(zip(arr1, arr2), r -> r.field0 is not NULL))
    FROM dataset
    

    Output:

    _col0
    {1=1, 2=2, 3=3}