sqlprestotrinocross-joinunnest

Splitting columns of type MAP(int, Array) by other array column


Let's assume that I have a table containing four columns

  ID_list               array(int)
  name_list             array(string)
  profile_scores        map(int, array(map(int, float)))
  profile_scores2       map(int, array(array(int)))

For instance

ID_list      name_list       profile_scores               profile_scores2
[1, 2]       ['A', 'B']      {11: [{0:10.5}, {1:10.2}]}      {7: [[4, 5], [7, 8, 9]]}

the key of profile_scores, profile_scores2 columns are unique. There's only one key for those columns. However, the array contains as many elements as ID_list.

Basically in the above case, the below four are equal

   Length(ID_list)
   Length(name_list)
   Length(profile_scores[11])
   Length(profile_scores2[7])

I want to split it, so that the each element of ID_list is a separate row. For instance, in the above case

 ID_list      name_list       profile_scores               profile_scores2
 1             'A'            {11: {0:10.5}}               {7: [4, 5]}
 2             'B'            {11: {1:10.2}}               {7: [7, 8, 9]}

so the new data types would be

  ID_list               int
  name_list             string
  profile_scores        map(int, map(int, float))
  profile_scores2       map(int, array(int))

I feel like CROSS JOIN / UNNEST could work, but I'm not sure how to preserve the key for profile_scores, profile_scores2 columns and just split the array in the value.

Is this operation doable in SQL? If so, how should I apply CROSS JOIN / UNNEST (or some other operations) to complete this?


Solution

  • Presto/Trino allows passing multiple columns to the unnest:

    UNNEST can also be used with multiple arguments, in which case they are expanded into multiple columns, with as many rows as the highest cardinality argument (the other columns are padded with nulls)

    select id, name, profile_scores, profile_scores2
    from table
       , unnest(id_list, name_list) as t(id, name);   
    

    which basically result in the unnested columns flattened based on the index (what seems exactly what you need).

    UPD

    Presto allows also flattening maps into key-value pairs (tested with Trino):

    select id, name, k, v, k1, v1
    from table
       , unnest(id_list, name_list, profile_scores, profile_scores2) as t(id, name, k, v, k1, v1);