I am having trouble while trying to understand the concept of semi-join and how it is different from conventional join. I have tried some article already but not satisfied with the explanation, could someone please help me to understand it?
A semi-join is basically a subquery that is applied in WHERE
clause.
Simple example. Let's select students with grades using INNER JOIN
:
SELECT DISTINCT s.id
FROM students AS s
INNER JOIN grades AS g
ON g.student_id = s.id
Now, the same with left semi-join:
SELECT s.id
FROM students AS s
WHERE EXISTS (SELECT 1
FROM grades AS g
WHERE g.student_id = s.id)
The latter is generally more efficient (depending on concrete DBMS and query optimizer).
But note that after matching, INNER JOIN
returns the result of both tables, whereas left semi-join returns only the left table records (that's where semi comes from). So the usage may depend on your desired output.