I have a below table which has 5 columns. The column CLASS have data with empty arrays.
key | arrival | carrier | class | departure |
---|---|---|---|---|
50B1AE7A0C1BF0001521CEE | ["2024-02-15T11:35:00.000+08:00","2024-02-20T15:10:00.000+08:00"] | ["MU","MU"] | [ ] | ["2024-02-15T08:45:00.000+08:00","2024-02-20T12:35:00.000+08:00"] |
0B88E4801F1B1000544C097 | ["2024-07-25T09:00:00.000+04:00","2024-08-08T07:30:00.000+02:00"] | ["TX","TX"] | ["T","X"] | ["2024-07-24T19:55:00.000+02:00","2024-08-07T22:10:00.000+04:00"] |
AC3668A4A61C800015210B6 | ["2024-04-08T20:40:00.000+03:00","2024-04-09T02:59:00.000+03:00","2024-06-29T06:40:00.000+03:00","2024-06-29T13:35:00.000+01:00"] | ["MS","MS","MS","MS"] | ["T","T"] | ["2024-04-08T15:00:00.000+01:00","2024-04-08T23:40:00.000+03:00","2024-06-29T04:00:00.000+03:00","2024-06-29T10:10:00.000+03:00"] |
86BF7C03C222300004DE0EE | ["2024-01-21T22:35:00.000+06:00"] | ["BG"] | [ ] | ["2024-01-21T21:15:00.000+05:30"] |
I am expecting the data in below format:
key | arrival | carrier | class | departure |
---|---|---|---|---|
50B1AE7A0C1BF0001521CEE | 2024-02-15T11:35:00.000+08:00 | MU | null | 2024-02-15T08:45:00.000+08:00 |
50B1AE7A0C1BF0001521CEE | 2024-02-20T15:10:00.000+08:00 | MU | null | 2024-02-20T12:35:00.000+08:00 |
0B88E4801F1B1000544C097 | 2024-07-25T09:00:00.000+04:00 | TX | T | 2024-07-24T19:55:00.000+02:00 |
0B88E4801F1B1000544C097 | 2024-08-08T07:30:00.000+02:00 | TX | X | 2024-08-07T22:10:00.000+04:00 |
AC3668A4A61C800015210B6 | 2024-04-08T20:40:00.000+03:00 | MS | T | 2024-04-08T15:00:00.000+01:00 |
AC3668A4A61C800015210B6 | 2024-04-09T02:59:00.000+03:00 | MS | T | 2024-04-08T23:40:00.000+03:00 |
AC3668A4A61C800015210B6 | 2024-06-29T06:40:00.000+03:00 | MS | T | 2024-06-29T04:00:00.000+03:00 |
AC3668A4A61C800015210B6 | 2024-06-29T13:35:00.000+01:00 | MS | T | 2024-06-29T10:10:00.000+03:00 |
86BF7C03C222300004DE0EE | 2024-01-21T22:35:00.000+06:00 | BG | null | 2024-01-21T21:15:00.000+05:30 |
I have tried multiple approaches but everytime it does a crossjoin with all the rows. Can someone please help me to get the required output
Try this. It will work.
select
f.key,
exploded_arrival_table.exploded_arrival_col,
exploded_carrier_table.exploded_carrier_col,
exploded_class_table.exploded_class_col,
exploded_departure_table.exploded_departure_col
from (select key, arrival, carrier,
CASE WHEN size(class)=0 and sz=1 THEN array("NULL")
WHEN size(class)=0 and sz=2 THEN array("NULL","NULL")
WHEN size(class)=0 and sz=3 THEN array("NULL","NULL","NULL")
WHEN size(class)=0 and sz=4 THEN array("NULL","NULL","NULL","NULL")
ELSE class END AS class,
departure from (select *, size(arrival) as sz flight) in_tbl_flight) f
LATERAL VIEW POSEXPLODE(f.arrival) exploded_arrival_table as t_xat, exploded_arrival_col
LATERAL VIEW POSEXPLODE(f.carrier) exploded_carrier_table as t_xct, exploded_carrier_col
LATERAL VIEW POSEXPLODE(f.class) exploded_class_table as t_xcct, exploded_class_col
LATERAL VIEW POSEXPLODE(f.departure) exploded_departure_table as t_xdct, exploded_departure_col
where t_xat=t_xct and t_xct=t_xcct and t_xcct=t_xdct;