I have a hierarchical table with an id and parent_id column, the parent_id has a foreign key to the id column. Each row can only have one parent, but multiple rows can be tied to the same parent. I wanted to retrieve all the rows that didn't have any children.
I attempted this with the following query
SELECT *
FROM table
WHERE id NOT IN (
SELECT DISTINCT(parent_id)
FROM table
)
This returned 0 rows. If i change the NOT IN to IN it correctly returns the rows that have children (other rows tie to it through their parent_id)
I ended up getting this working:
SELECT *
FROM table
WHERE id NOT IN(
SELECT id
FROM table
WHERE id IN (
SELECT DISTINCT(parent_id)
FROM table
)
)
But I don't understand why the first query didn't work? Can anybody help me to understand what's going on here? Do I not understand how NOT IN should work?
Try
SELECT *
FROM table
WHERE id NOT IN (
SELECT DISTINCT(parent_id)
FROM table
WHERE parent_id IS NOT NULL
)
For example:
with t(x) as (values(1),(2))
select 3
where 3 not in (select x from t);
┌──────────┐
│ ?column? │
├──────────┤
│ 3 │
└──────────┘
but
with t(x) as (values(1),(2),(null))
select 3
where 3 not in (select x from t);
┌──────────┐
│ ?column? │
├──────────┤
└──────────┘
It is because the DBMS can't made decision is id = null
or not (the result is undefined
)
You could to fix it as mentioned above or using not exists
:
with t(x) as (values(1),(2),(null))
select 3
where not exists (select x from t where x = 3);
┌──────────┐
│ ?column? │
├──────────┤
│ 3 │
└──────────┘