hivehiveql

Keeping the order of records in Hive collect


I have a HIVE table as follows:

select id, id_2, val from test order by id;

234 974 0.5
234 457 0.7
234 236 0.5
234 859 0.6
123 859 0.7
123 236 0.6
123 974 0.5
123 457 0.5

I'm trying to collect the data based on the id value. I need the collected data to be following the same order for each row. My expected output is as follows: (any order is fine as long it's the same for all the rows):

234 [974,457,236,859]   [0.5,0.7,0.5,0.6]
123 [974,457,236,859]   [0.5,0.5,0.6,0.7]

I used the collect UDF from Brickhouse.

select tmp.id, collect(id_2), collect(tmp.val) from
(select id, id_2, val from test
order by id) tmp
group by tmp.id
;

234 [974,457,236,859]   [0.5,0.7,0.5,0.6]
123 [859,236,974,457]   [0.7,0.6,0.5,0.5]

As you can see, the order of the columns is not being kept. Is there any way to keep the ordering constant throughout the output? Any tips will be appreciated.


Solution

  • Use this query

    select tmp.id, collect(id_2), collect(tmp.val) from
    (select id, id_2, val from test
    order by id desc, id_2 desc) tmp
    group by tmp.id
    ;
    

    Output as below,

    234 [974,457,236,859]   [0.5,0.7,0.5,0.6]
    123 [974,457,236,859]   [0.5,0.5,0.6,0.7]
    

    Basically modified

    order by id
    

    to

       order by id desc, id_2 desc