Yesterday I discovered something strange about SQL, or at least PostreSQL. Take a look below and explain why first query does nothing and the second one works just right:
-- this silently does nothing
update bodycontent
set body = replace(body, '~' || u.oldusername, '~' || u.newusername)
from usermigration u;
-- this works as expected
update bodycontent
set body = replace(body, '~' || oldusername, '~' || newusername)
from usermigration u;
Update: I think that everyone is missing the point on this question, the Cartesian product was an original intent: there are going to be N x M updates and this is by design.
I need to replace all the pairs of usernames existing in the migration table in each row from bodycontent
.
And, I repeat, the second version works as expected, but the first one does no update. All I wanted to know was why.
| usermigration table |
--------------------------
oldusername | newusersname
--------------------------
johndoe | johnd
john.smith | johnsmith
Is this a bug in PostgreSQL?
You are missing a WHERE
clause (which cannot be written as direct JOIN
condition). The additional table usermigration
has to be joined to the table bodycontent
to be updated in some way, or every row of bodycontent
has as many update candidates as there are rows in usermigration
- the Cartesian product between the two tables.
There is no way to tell which one will be applied and persist. Both statements are wrong in this regard. With 1000 rows in usermigration
and 1000 in bodycontent
we have 1 000 000 update candidates before 1000 can be picked.
Additional tables in the FROM
clause hardly ever make sense without a WHERE
clause connecting them to the main table.
When a
FROM
clause is present, what essentially happens is that the target table is joined to the tables mentioned in thefrom_item
list, and each output row of the join represents an update operation for the target table. When usingFROM
you should ensure that the join produces at most one output row for each row to be modified. In other words, a target row shouldn't join to more than one row from the other table(s). If it does, then only one of the join rows will be used to update the target row, but which one will be used is not readily predictable.
The FROM
clause in the UPDATE
statement is a PostgreSQL extension to the SQL standard. Other DBMS use different syntax, for instance explicit joins to the table to be updated (in tSQL) that do not work for PostgreSQL.
This query works, mostly1:
UPDATE bodycontent b
SET body = replace(b.body, u.oldusername, u.newusername)
FROM usermigration u
WHERE b.body LIKE ('%' || u.oldusername || '%');
1 The outcome is still ambiguous. Multiple matches can be found. It is uncertain which will be applied. The problem is that your requirements are inherently ambiguous. There can be multiple (overlapping) usernames that match and the order in which updates are applied is relevant (but undefined). The UPDATE
statement perfectly reflects your flawed requirements.