sqlpostgresqljoinsql-updatecommon-table-expression

UPDATE with LEFT JOIN and condition IS NULL


I have the following table definitions:

Table public.messages:

Column Type Collation Nullable Default
ip text
msg text
ignore boolean

Table public.host:

Column Type Collation Nullable Default
ip text
name text

Yes, I could set type as IP for the ip, but for this example it doesn't matter.

I run this query:

SELECT * 
FROM messages 
LEFT JOIN host ON messages.ip = host.ip;

And get this result:

ip msg ignore ip name
1.1.1.1 Test1 1.1.1.1 host1
1.1.1.2 Test2

Sample data can be found here: https://dbfiddle.uk/3LTo9shO

I want to update all rows where there is no host name to set ignore to true.

So basically, just the ignore column for this result:

SELECT * 
FROM messages 
LEFT JOIN host ON messages.ip = host.ip 
WHERE host.name IS NULL;
ip msg ignore ip name
1.1.1.2 Test2

I can do an update for the row that does have a host name. When I set the value, everything works:

UPDATE messages 
SET ignore = FALSE 
FROM host 
WHERE messages.ip = host.ip AND host.name IS NOT NULL;

UPDATE 1

SELECT * 
FROM messages 
LEFT JOIN host ON messages.ip = host.ip;
ip msg ignore ip name
1.1.1.1 Test1 f 1.1.1.1 host1
1.1.1.2 Test2

However, setting true for those that don't have a host name doesn't work.

UPDATE messages 
SET ignore = TRUE 
FROM host 
WHERE messages.ip = host.ip AND host.name IS NULL;

UPDATE 0

I tried with a LEFT JOIN, but then everything was updated:

UPDATE messages 
SET ignore = TRUE 
FROM messages m 
LEFT JOIN host h ON m.ip = h.ip 
WHERE h.name IS NULL;

UPDATE 2

SELECT * 
FROM messages 
LEFT JOIN host ON messages.ip = host.ip;
ip msg ignore ip name
1.1.1.1 Test1 t 1.1.1.1 host1
1.1.1.2 Test2 t

https://dbfiddle.uk/fOJ55JBM

How can I update just rows without a host name?


Solution

  • update messages as m
    set ignore=not exists(select from host as h 
                          where h.ip=m.ip
                            and h.name is not null);
    
    1. I added a third test case with a host entry matching a message but with a null in host.name. If it's not allowed, and h.name is not null above can be removed.
    2. I'm updating all rows; messages with a named host get an ignore=false, those without a host or with an unnamed one get a true. Depending on cardinalities it might be more efficient to only target one group or the other, then apply a default opposite value wherever it remained null, indicating it belongs to the opposite group.
    3. (Addressed in the question, so this one's just for posterity) Don't use type varchar for IP addresses. Postgres offers quick and lightweight built-in inet with proper validation, indexes, functions and operators. If you ever need MAC addresses, there's also macaddr.
    4. Don't default to varchar in general, especially a limited one. Plain text type is a better default.
    5. Postgres supports empty select lists and exists discards everything on them. You can select/*nothing*/from tables if you're just checking counts or presence/existence.
    6. If the join column name matches, there's a convenient messages join host using(ip) syntax - no aliasing or repeated identifiers, no dots, no operators. Especially helpful if you otherwise need to type out a long list of a.x=b.x and a.y=b.y and.. comparisons that reduce to using(x,y,..).

    Posting mainly as an opportunity to add these points. This demo also shows how you do that with a CTE, which I don't necessarily enjoy either, but it does let you expand it to a multi-table update Postgres doesn't otherwise support.

    Performance wise, the optimal approach heavily depends on how many records you have on either side, how many match this comparison as well as what indexes you already have in place, or how flexible you are when it comes to adding more. The demo shows an example involving 150k messages between about 262144 possible IP's, out of which 30k are known hosts, among which 10% doesn't have a name and about 5k share an IP. Exec times of the solutions proposed here are comparable in this case.