I'm using PostgreSQL 14. Create the following table:
create table users
(
id serial primary key,
first_name varchar(255) not null,
last_name varchar(255) not null
);
insert into users(first_name, last_name) values ('a', 'a');
These two statements when executed in two separate concurrent transactions, with isolation level set to SERIALIZABLE, conflict. The transaction started later will:
begin isolation level serializable;
update users set first_name = 'aa' where first_name = 'a';
commit;
-- Concurrently, note it operates on last_name instead
begin isolation level serializable;
update users set last_name = 'aa' where last_name = 'a';
commit;
Why is that? It seems to me allowing both transactions to proceed wouldn't violate any non-serializability invariants (these transactions can be executed in any order, and the result will be the same), or create any phenomena disallowed by lower isolation levels.
Is it because isolation levels are implemented with whole row locks, and it's a leaking implementation?
these transactions can be executed in any order, and the result will be the same
In order, yes. Concurrently, with their results serialized after the fact, no.
In Postgres MVCC an update
is really a delete
and an insert
. It marks the old version of the row for deletion and inserts a new one, elsewhere. Your two concurrent transactions agree the old row version has to be deleted but if they were to finish, they'd leave two distinct "latest" row versions. Those are full rows, not just field/cell deltas you could resolve and merge.
If you run these transactions in order, the second one reads then invalidates the output from the first one - before writing its own. Concurrently, serializable
doesn't attempt to feed them into one another, it just sees they're both modifying the same row and doesn't even care how.
Even if both transactions do a neutral non-update (assuming no intervening rule
s and trigger
s):
update users set first_name=first_name;
Regardless of whether they tried to self-rewrite the same column or a different one, this is still non-serializable. No conflict resolution is attempted, no auto-merging of deltas.
Is it because isolation levels are implemented with whole row locks, and it's a leaking implementation?
Correct. It's easy to emulate field/cell/value-level locking using an Entity-Attribute-Value model, or 6th normal form. Except for the key (and the attribute identifier in EAV), in either case the value is the only value in a row so a row lock becomes effectively a value lock.
6NF: demo1 at db<>fiddle
create table users
( id int generated by default as identity primary key);
create table users_first_name_6nf
( id int references users(id) not null,
first_name text not null );
create table users_last_name_6nf
( id int references users(id) not null,
last_name text not null
);
with usr as(
insert into users
default values
returning id)
,first_name as(
insert into users_first_name_6nf
select usr.id,'a'
from usr)
insert into users_last_name_6nf
select usr.id,'a'
from usr;
select* from users
natural join users_first_name_6nf
natural join users_last_name_6nf;
Then even though the two updates target the same user, they are performed on different rows of different tables.
In EAV, they are on different rows of the same table: demo2 at db<>fiddle
create table users
( id int generated by default as identity primary key);
create type user_attributes as enum('first_name','last_name');
create table users_eav
( id int references users(id) not null,
attribute user_attributes,
val text);
with usr as(insert into users default values returning id)
insert into users_eav
select usr.id, v.attribute::user_attributes, v.val
from usr
cross join(values('first_name','a')
,('last_name','a'))as v(attribute,val);
select id
, fname.val as first_name
, lname.val as last_name
from users
join users_eav as fname using(id)
join users_eav as lname using(id)
where fname.attribute='first_name'
and lname.attribute='last_name';
Both models are widely discussed on both Wikipedia, here on SO and pretty much everywhere else, so I'll leave further commentary out. This is by no means a suggestion either of them is a good idea or even offers tolerable performance or space efficiency.
The default read committed
mode handles this naturally, with no changes to the model: the first update
acquires a lock on the row, the second update sees it and waits for the first transaction holding it to complete and release it before it can continue. So in a way, they "naturally serialize", queue up when needed.
It doesn't matter whether you request the lock explicitly with select..for update
or let your update
get one implicitly.