hive

Explode hive table with null arrays in the column


I have a below table which has 5 columns. The column CLASS have data with empty arrays.

key arrival carrier class departure
50B1AE7A0C1BF0001521CEE ["2024-02-15T11:35:00.000+08:00","2024-02-20T15:10:00.000+08:00"] ["MU","MU"] [ ] ["2024-02-15T08:45:00.000+08:00","2024-02-20T12:35:00.000+08:00"]
0B88E4801F1B1000544C097 ["2024-07-25T09:00:00.000+04:00","2024-08-08T07:30:00.000+02:00"] ["TX","TX"] ["T","X"] ["2024-07-24T19:55:00.000+02:00","2024-08-07T22:10:00.000+04:00"]
AC3668A4A61C800015210B6 ["2024-04-08T20:40:00.000+03:00","2024-04-09T02:59:00.000+03:00","2024-06-29T06:40:00.000+03:00","2024-06-29T13:35:00.000+01:00"] ["MS","MS","MS","MS"] ["T","T"] ["2024-04-08T15:00:00.000+01:00","2024-04-08T23:40:00.000+03:00","2024-06-29T04:00:00.000+03:00","2024-06-29T10:10:00.000+03:00"]
86BF7C03C222300004DE0EE ["2024-01-21T22:35:00.000+06:00"] ["BG"] [ ] ["2024-01-21T21:15:00.000+05:30"]

I am expecting the data in below format:

key arrival carrier class departure
50B1AE7A0C1BF0001521CEE 2024-02-15T11:35:00.000+08:00 MU null 2024-02-15T08:45:00.000+08:00
50B1AE7A0C1BF0001521CEE 2024-02-20T15:10:00.000+08:00 MU null 2024-02-20T12:35:00.000+08:00
0B88E4801F1B1000544C097 2024-07-25T09:00:00.000+04:00 TX T 2024-07-24T19:55:00.000+02:00
0B88E4801F1B1000544C097 2024-08-08T07:30:00.000+02:00 TX X 2024-08-07T22:10:00.000+04:00
AC3668A4A61C800015210B6 2024-04-08T20:40:00.000+03:00 MS T 2024-04-08T15:00:00.000+01:00
AC3668A4A61C800015210B6 2024-04-09T02:59:00.000+03:00 MS T 2024-04-08T23:40:00.000+03:00
AC3668A4A61C800015210B6 2024-06-29T06:40:00.000+03:00 MS T 2024-06-29T04:00:00.000+03:00
AC3668A4A61C800015210B6 2024-06-29T13:35:00.000+01:00 MS T 2024-06-29T10:10:00.000+03:00
86BF7C03C222300004DE0EE 2024-01-21T22:35:00.000+06:00 BG null 2024-01-21T21:15:00.000+05:30

I have tried multiple approaches but everytime it does a crossjoin with all the rows. Can someone please help me to get the required output


Solution

  • Try this. It will work.

    select 
    f.key,
    exploded_arrival_table.exploded_arrival_col,
    exploded_carrier_table.exploded_carrier_col,
    exploded_class_table.exploded_class_col,
    exploded_departure_table.exploded_departure_col 
    from (select key, arrival, carrier, 
                 CASE WHEN size(class)=0 and sz=1 THEN array("NULL") 
                      WHEN size(class)=0 and sz=2 THEN array("NULL","NULL")
                      WHEN size(class)=0 and sz=3 THEN array("NULL","NULL","NULL") 
                      WHEN size(class)=0 and sz=4 THEN array("NULL","NULL","NULL","NULL") 
                      ELSE class END AS class, 
           departure from (select *, size(arrival) as sz flight) in_tbl_flight) f 
    LATERAL VIEW POSEXPLODE(f.arrival)   exploded_arrival_table as t_xat, exploded_arrival_col 
    LATERAL VIEW POSEXPLODE(f.carrier)   exploded_carrier_table as t_xct, exploded_carrier_col
    LATERAL VIEW POSEXPLODE(f.class)     exploded_class_table as  t_xcct, exploded_class_col
    LATERAL VIEW POSEXPLODE(f.departure) exploded_departure_table as t_xdct, exploded_departure_col 
    where t_xat=t_xct and t_xct=t_xcct and t_xcct=t_xdct;