I ran into an interesting problem in a system where due to a schema change, a first database transaction in a single thread blocks a second database transaction from completing, until a timeout occurs.
To test this I created a test database:
CREATE DATABASE StackOverflow
GO
USE StackOverflow
ALTER DATABASE StackOverflow SET ALLOW_SNAPSHOT_ISOLATION ON
ALTER DATABASE StackOverflow SET READ_COMMITTED_SNAPSHOT ON WITH ROLLBACK IMMEDIATE
GO
CREATE TABLE One (
Id int CONSTRAINT pkOne PRIMARY KEY,
A varchar(10) NOT NULL
)
CREATE TABLE Two (
Id int CONSTRAINT pkTwo PRIMARY KEY,
B varchar(10) NOT NULL,
OneId int NOT NULL CONSTRAINT fkTwoToOne REFERENCES One
)
GO
-----------------------------------------------
CREATE TABLE Three (
Id int CONSTRAINT pkThree PRIMARY KEY,
SurrogateId int NOT NULL CONSTRAINT ThreeSurrUnique UNIQUE,
C varchar(10) NOT NULL
)
GO
CREATE TABLE Four (
Id int CONSTRAINT pkFour PRIMARY KEY,
D varchar(10) NOT NULL,
ThreeSurrogateId int NOT NULL CONSTRAINT fkFourToThree REFERENCES Three(SurrogateId)
)
GO
--Seed data
INSERT INTO One (Id, A) VALUES (1, '')
INSERT INTO Three (Id, SurrogateId, C) VALUES (3, 50, '')
In this first test, a transaction modifying a row in table One is started, but not yet committed. Another transaction is inserting into table Two, with a column referencing the same row being modified in the first transaction in table One. The second transaction will hang forever until the first transaction is committed.
The reason the transaction waits is due to a LCK_M_S keylock held by the first transaction.
In my second test, a transaction modifying a row in table Three is started, but not yet committed, just as in the first test. Another transaction is inserting into table Four, with a column referencing the same row being modified in the first transaction in table Three. Except this time, table Four references a surrogate key in table Three instead of the primary key. The transaction completed immediately and is unaffected by the first transaction.
I need help understanding why the the latter transaction is always blocked by the former transaction when inserting a row in a separate table that references a table that was modified in the first transaction. I think the obvious unhelpful answer is because of the foreign key constraint. But why? Especially because this is snapshot isolation, why does the latter transaction care about the former at all? The row it's referencing already exists and the foreign key can easily be verified, as proven by the second test where a foreign key referencing a surrogate key completes without obstruction.
The answer is rather simple.
When a query reads to validate Foreign Key constraints they always use locks, never row versioning. Imagine if one transaction is changing a PK value, and a concurrent session inserted a row referencing the old PK value. It isn't allowed to validate the FK constraint based on the consistent version of the row in the version store. If it were, then all the FK's would have to be validated again when the PK change was committed.
In the first case the update transaction has a key lock on the target index of the FK, so the concurrent session can't read the PK values.
In the second, the update doesn't affect the unique key involved in the FK. The update is able to place a Shared lock on the target key value, since the updating session has an exclusive key lock on a key in a different unique index.
In the first example after the first transaction commits, the second one fails with a snapshot isolation update conflict:
Msg 3960, Level 16, State 2, Line 10 Snapshot isolation transaction aborted due to update conflict. You cannot use snapshot isolation to access table 'dbo.One' directly or indirectly in database 'StackOverflow' to update, delete, or insert the row that has been modified or deleted by another transaction. Retry the transaction or change the isolation level for the update/delete statement.
This is because in SNAPSHOT isolation you cannot read a row that has changed since your transaction started. And since the FK validation can't use row versions, it needs to read the PK from a row that was updated after its transaction started. This is a violation of SNAPSHOT isolation as that PK value might not have existed at the beginning of the SNAPSHOT transaction.
This can be a little tricky to see, as the SNAPSHOT transaction doesn't really start at the point-in-time when you run BEGIN TRANSACTION (kind of like IMPLICIT TRANSACTIONS) the relevant point-in-time is the one when the transaction first reads or changes the database. EG
if @@trancount > 0 rollback
go
set transaction isolation level snapshot
begin transaction
drop table if exists t
create table t(id int)
--in another session run
--update one set a = a+'b' where id = 1
waitfor delay '0:0:10'
insert into two(id,b,oneid) values (2,'',1) -- fails