Here is the first table A
in Redshift:
team origin_id targte_id
a 1 11
b 2 22
b NULL 33
c 5 55
c NULL 66
The origin_id
and target_id
could be NULL, but they couldn't both NULL in one row.
And the second table B
:
team origin_id targte_id content
a 1 11 aaa
a 1 11 bbb
b NULL 22 xxx
c 5 NULL zzz
I want to join these 2 tables and match the value of content
to the table A
. The match rule is: A.team_id
=B.team_id
; if the origin_id
is not null, then A.origin_id
=B.origin_id
; if the origin_id
is null, then A.target_id
=B.target_id
. And the row count of the final result should be the same as the table A
.
The final result set I wanted is:
team origin_id targte_id content
a 1 11 aaa
b 2 22 xxx
b NULL 33 NULL
c 5 55 zzz
c NULL 66 NULL
(For the first row team=a
, it matches 2 row in table B
; just choose any one in these matched rows, and it is OK.)
What's the correct SQL to achieve this in Redshift? I try left join
but the result looks like a Cartesian Product of these 2 tables.
Left join is correct. However, in your example where
For the first row team=a, it matches 2 row in table B; just choose any one in these matched rows, and it is OK.
There is no way to "just choose any one", a JOIN operation will match both rows. It is up to you to de-dupe the result set afterwards.
For example, you can group by A.team_id, A.origin_id, A.target_id:
SELECT
team_id,
origin_id,
target_id,
MIN(content) any_content
FROM (
SELECT
A.team_id,
A.origin_id,
A.target_id,
B.content
FROM
A LEFT JOIN B
ON
A.team_id = B.team_id AND
COALESCE(A.origin_id, 'is_null') = COALESCE(B.origin_id, 'is_null') AND
COALESCE(A.target_id, 'is_null') = COALESCE(B.target_id, 'is_null')
)
GROUP BY
team_id,
origin_id,
target_id