I have two tables in Hive (arch
and noarch
) with the following structure:
Table1Arch | Table2NoArch |
---|---|
tr_id | tr_id |
res_id | res_id |
info_json | info_json |
created_at | |
updated_at |
I need to get a full data view arch
+ noarch
and join them by res_id
.
I tried to do different variations of left joins but I was getting either results from arch
and nothing from noarch
or vice versa. I guess I should have used union all
, but struggling to write it correctly.
Could you please help me with the right query?
EDIT:
I'd like to get kinda united view for these two tables when I search for particular res_id
s.
Say I have the following data:
Entry1Arch | Entry2Arch |
---|---|
1 | 2 |
111 | 222 |
{"something 1"} | {"something 2"} |
Entry3NoArch | Entry4NoArch |
---|---|
3 | 4 |
333 | 444 |
{"something 3"} | {"something 4"} |
2021-10-03 21:01:44.0 | 2021-10-04 21:02:43.0 |
2021-10-03 21:01:44.0 | 2021-10-04 21:02:43.0 |
The ultimate goal is to get full data from both tables: 111 + 222 + 333 + 444).
You can use UNION ALL:
select tr_id, res_id, info_json, created_at, updated_at, src
from
(select tr_id, res_id, info_json, created_at, updated_at, 'NoArch' as src
from Table2NoArch
union all
select tr_id, res_id, info_json, null created_at, null updated_at, 'Arch' as src
from Table1Arch
)u
where res_id in (111,333,444)
created_at and updated_at are absent in one Table1Arch, NULLs are selected, you can use current_timestamp or current_date instead.
Added src column, so you can easily find out the source of data.