I come across a query which is,
SELECT it1.survey_set_id, it1.type, it1.value FROM survey_condition_filter it1 LEFT JOIN survey_condition_filter it2 ON(it1.survey_set_id = it2.survey_set_id AND it2.type = 3002) WHERE it1.type IN (2000, 2001, 2002) AND it2.value IS NULL;
Why is self left-join is used in the above query.
SELECT survey_set_id, type, value FROM survey_condition_filter WHERE type IN (2000, 2001, 2002);
isn't the above query is equivalent to the first query which used self left-join. since the query is just also filtering the IN (2000, 2001, 2002) AND it2.value IS NULL
. I am confused the use of join query here and can't really understand the working of the first query.
This is table survey_condition_filter
+---------------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+---------------+--------------+------+-----+---------+-------+
| survey_id | bigint | NO | PRI | NULL | |
| survey_set_id | bigint | NO | PRI | NULL | |
| type | int | NO | PRI | NULL | |
| condition | tinyint | NO | PRI | NULL | |
| value | varchar(15) | NO | PRI | NULL | |
| display_value | text | NO | | NULL | |
| order | int | YES | | NULL | |
| created_at | datetime | NO | | NULL | |
| created_by | varchar(255) | YES | | NULL | |
| updated_at | datetime | NO | | NULL | |
| updated_by | varchar(255) | YES | | NULL | |
| deleted_at | datetime | YES | | NULL | |
| deleted_by | varchar(255) | YES | | NULL | |
+---------------+--------------+------+-----+---------+-------+
This is an anti join. We outer join a table, but then only keep the outer joined rows (by applying WHERE ... it2.value IS NULL
here). It is much more common - and much more readable in my opinion - to use a straight-forward NOT EXISTS
(or NOT IN
in many situations).
SELECT it1.survey_set_id, it1.type, it1.value
FROM survey_condition_filter it1
WHERE it1.type IN (2000, 2001, 2002)
AND NOT EXISTS
(
SELECT null
FROM survey_condition_filter it2
WHERE it2.survey_set_id = it1.survey_set_id
AND it2.type = 3002
);
What this query does is obvious: select all type 2000/2001/2002 rows where not exists a 3002 row for the same survey_set_id .
If survey_set_id is a non-nullable column, you can use NOT IN
, which gets the query even a tad shorter:
SELECT it1.survey_set_id, it1.type, it1.value
FROM survey_condition_filter it1
WHERE it1.type IN (2000, 2001, 2002)
AND itl.survey_set_id NOT IN
(
SELECT it2.survey_set_id
FROM survey_condition_filter it2
WHERE it2.type = 3002
);