sqlhivehiveqlhue

Get full data view for two tables in Hive?


I have two tables in Hive (arch and noarch) with the following structure:

Table1Arch Table2NoArch
tr_id tr_id
res_id res_id
info_json info_json
created_at
updated_at

I need to get a full data view arch + noarch and join them by res_id.

I tried to do different variations of left joins but I was getting either results from arch and nothing from noarch or vice versa. I guess I should have used union all, but struggling to write it correctly.

Could you please help me with the right query?

EDIT:

I'd like to get kinda united view for these two tables when I search for particular res_ids.

Say I have the following data:

Entry1Arch Entry2Arch
1 2
111 222
{"something 1"} {"something 2"}
Entry3NoArch Entry4NoArch
3 4
333 444
{"something 3"} {"something 4"}
2021-10-03 21:01:44.0 2021-10-04 21:02:43.0
2021-10-03 21:01:44.0 2021-10-04 21:02:43.0

The ultimate goal is to get full data from both tables: 111 + 222 + 333 + 444).


Solution

  • You can use UNION ALL:

    select tr_id, res_id, info_json, created_at, updated_at, src
    from
    (select tr_id, res_id, info_json, created_at, updated_at, 'NoArch' as src 
      from Table2NoArch
    
    union all
    
    select tr_id, res_id, info_json, null created_at, null updated_at, 'Arch' as src 
      from Table1Arch
    )u
    where res_id in (111,333,444)
    

    created_at and updated_at are absent in one Table1Arch, NULLs are selected, you can use current_timestamp or current_date instead.

    Added src column, so you can easily find out the source of data.