sqldatabasejoindistributed-databasesemi-join

What is semi-join in database?


I am having trouble while trying to understand the concept of semi-join and how it is different from conventional join. I have tried some article already but not satisfied with the explanation, could someone please help me to understand it?


Solution

  • A semi-join is basically a subquery that is applied in WHERE clause.

    Simple example. Let's select students with grades using INNER JOIN:

    SELECT DISTINCT s.id
    FROM students AS s
    INNER JOIN grades AS g
        ON g.student_id = s.id
    

    Now, the same with left semi-join:

    SELECT s.id
    FROM students AS s
    WHERE EXISTS (SELECT 1 
                  FROM grades AS g
                  WHERE g.student_id = s.id)
    

    The latter is generally more efficient (depending on concrete DBMS and query optimizer).

    But note that after matching, INNER JOIN returns the result of both tables, whereas left semi-join returns only the left table records (that's where semi comes from). So the usage may depend on your desired output.