sqljoin

Prevent duplicate values in LEFT JOIN


I faced a situation where I got duplicate values from LEFT JOIN. I think this might be a desired behavior but unlike from what I want.

I have three tables: person, department and contact.

person :

id bigint,
person_name character varying(255)

department :

person_id bigint,
department_name character varying(255)

contact :

person_id bigint,
phone_number character varying(255)

Sql Query :

SELECT p.id, p.person_name, d.department_name, c.phone_number 
FROM person p
  LEFT JOIN department d 
    ON p.id = d.person_id
  LEFT JOIN contact c 
    ON p.id = c.person_id;

Result :

id|person_name|department_name|phone_number
--+-----------+---------------+------------
1 |"John"     |"Finance"      |"023451"
1 |"John"     |"Finance"      |"99478"
1 |"John"     |"Finance"      |"67890"
1 |"John"     |"Marketing"    |"023451"
1 |"John"     |"Marketing"    |"99478"
1 |"John"     |"Marketing"    |"67890"
2 |"Barbara"  |"Finance"      |""
3 |"Michelle" |""             |"005634"

I know it's what joins do, keeping multiplied with selected rows. But It gives a sense like phone numbers 023451,99478,67890 are for both departments while they are only related to person john with unnecessary repeated values which will escalate the problem with larger data set.
So, here is what I want:

id|person_name|department_name|phone_number
--+-----------+---------------+------------
1 |"John"     |"Finance"      |"023451"
1 |"John"     |"Marketing"    |"99478"
1 |"John"     |""             |"67890"
2 |"Barbara"  |"Finance"      |""
3 |"Michelle" |""             |"005634"

This is a sample of my situation and I am using a large set of tables and queries. So, kind of need a generic solution.


Solution

  • I like to call this problem "cross join by proxy". Since there is no information (WHERE or JOIN condition) how the tables department and contact are supposed to match up, they are cross-joined via the proxy table person - giving you the Cartesian product. Very similar to this one:

    More explanation there.

    Solution for your query:

    SELECT p.id, p.person_name, d.department_name, c.phone_number
    FROM   person p
    LEFT   JOIN (
       SELECT person_id, min(department_name) AS department_name
       FROM   department
       GROUP  BY person_id
       ) d ON d.person_id = p.id
    LEFT   JOIN (
       SELECT person_id, min(phone_number) AS phone_number
       FROM   contact
       GROUP  BY person_id
       ) c ON c.person_id = p.id;
    

    You did not define which department or phone number to pick, so I arbitrarily chose the minimum. You can have it any other way ...