sqlpostgresqljoinleft-joinwhere-clause

Explain JOIN vs. LEFT JOIN and WHERE condition performance suggestion in more detail


In this candidate answer it is asserted that JOIN is better than LEFT JOIN under some circumstances involving some WHERE clauses because it does not confuse the query planner and is not "pointless". The assertion/assumption is that it should be obvious to anyone.

Please explain further or provide link(s) for further reading.


Solution

  • Consider the following example. We have two tables, DEPARTMENTS and EMPLOYEES.

    Some departments do not yet have any employees.

    This query uses an inner join that finds the department employee 999 works at, if any, otherwise it shows nothing (not even the employee or his or her name):

    select a.department_id, a.department_desc, b.employee_id, b.employee_name
      from departments a
      join employees b
        on a.department_id = b.department_id
     where b.employee_id = '999'
    

    This next query uses an outer join (left between departments and employees) and finds the department that employee 999 works for. However it too will not show the employee's ID or his or her name, if they do not work at any departments. That is because of the outer joined table being used in the WHERE clause. If there is no matching department, it will be null (not 999, even though 999 exists in employees).

    select a.department_id, a.department_desc, b.employee_id, b.employee_name
      from departments a
      left join employees b
        on a.department_id = b.department_id
     where b.employee_id = '999'
    

    But consider this query:

    select a.department_id, a.department_desc, b.employee_id, b.employee_name
      from departments a
      left join employees b
        on a.department_id = b.department_id
       and b.employee_id= '999'
    

    Now the criteria is in the on clause. So even if this employee works at no departments, he will still be returned (his ID and name). The department columns will be null, but we get a result (the employee side).

    You might think you would never want to use the outer joined table in the WHERE clause, but that is not necessarily the case. Normally it is, for the reason described above, though.

    Suppose you want all departments with no employees. Then you could run the following, which does use an outer join, and the outer joined table is used in the where clause:

    select a.department_id, a.department_desc, b.employee_id
      from departments a
      left join employees b
        on a.department_id = b.department_id
     where b.employee_id is null
    

    ^^ Shows departments with no employees.

    The above is likely the only legitimate reason you would want to use an outer joined table in the WHERE clause rather than the ON clause (which I think is what your question is; the difference between inner and outer joins is an entirely different topic).

    A good way to look at is this: You use outer joins to allow nulls. Why would you then use an outer join and say that a field should not be null and should be equal to 'XYZ'? If a value has to be 'XYZ' (not null), then why instruct the database to allow nulls to come back? It's like saying one thing and then overriding it later.