Ok, I'm totally lost on deadlock issue. I just don't know how to solve this.
I have these three tables (I have removed not important columns):
CREATE TABLE [dbo].[ManageServicesRequest]
(
[ReferenceTransactionId] INT NOT NULL,
[OrderDate] DATETIMEOFFSET(7) NOT NULL,
[QueuePriority] INT NOT NULL,
[Queued] DATETIMEOFFSET(7) NULL,
CONSTRAINT [PK_ManageServicesRequest] PRIMARY KEY CLUSTERED ([ReferenceTransactionId]),
)
CREATE TABLE [dbo].[ServiceChange]
(
[ReferenceTransactionId] INT NOT NULL,
[ServiceId] VARCHAR(50) NOT NULL,
[ServiceStatus] CHAR(1) NOT NULL,
[ValidFrom] DATETIMEOFFSET(7) NOT NULL,
CONSTRAINT [PK_ServiceChange] PRIMARY KEY CLUSTERED ([ReferenceTransactionId],[ServiceId]),
CONSTRAINT [FK_ServiceChange_ManageServiceRequest] FOREIGN KEY ([ReferenceTransactionId]) REFERENCES [ManageServicesRequest]([ReferenceTransactionId]) ON DELETE CASCADE,
INDEX [IDX_ServiceChange_ManageServiceRequestId] ([ReferenceTransactionId]),
INDEX [IDX_ServiceChange_ServiceId] ([ServiceId])
)
CREATE TABLE [dbo].[ServiceChangeParameter]
(
[ReferenceTransactionId] INT NOT NULL,
[ServiceId] VARCHAR(50) NOT NULL,
[ParamCode] VARCHAR(50) NOT NULL,
[ParamValue] VARCHAR(50) NOT NULL,
[ParamValidFrom] DATETIMEOFFSET(7) NOT NULL,
CONSTRAINT [PK_ServiceChangeParameter] PRIMARY KEY CLUSTERED ([ReferenceTransactionId],[ServiceId],[ParamCode]),
CONSTRAINT [FK_ServiceChangeParameter_ServiceChange] FOREIGN KEY ([ReferenceTransactionId],[ServiceId]) REFERENCES [ServiceChange] ([ReferenceTransactionId],[ServiceId]) ON DELETE CASCADE,
INDEX [IDX_ServiceChangeParameter_ManageServiceRequestId] ([ReferenceTransactionId]),
INDEX [IDX_ServiceChangeParameter_ServiceId] ([ServiceId]),
INDEX [IDX_ServiceChangeParameter_ParamCode] ([ParamCode])
)
And these two procedures:
CREATE PROCEDURE [dbo].[spCreateManageServicesRequest]
@ReferenceTransactionId INT,
@OrderDate DATETIMEOFFSET,
@QueuePriority INT,
@Services ServiceChangeUdt READONLY,
@Parameters ServiceChangeParameterUdt READONLY
AS
BEGIN
SET NOCOUNT ON;
BEGIN TRY
/* VYTVOŘ NOVÝ REQUEST NA ZMĚNU SLUŽEB */
/* INSERT REQUEST */
INSERT INTO [dbo].[ManageServicesRequest]
([ReferenceTransactionId]
,[OrderDate]
,[QueuePriority]
,[Queued])
VALUES
(@ReferenceTransactionId
,@OrderDate
,@QueuePriority
,NULL)
/* INSERT SERVICES */
INSERT INTO [dbo].[ServiceChange]
([ReferenceTransactionId]
,[ServiceId]
,[ServiceStatus]
,[ValidFrom])
SELECT
@ReferenceTransactionId AS [ReferenceTransactionId]
,[ServiceId]
,[ServiceStatus]
,[ValidFrom]
FROM @Services AS [S]
/* INSERT PARAMS */
INSERT INTO [dbo].[ServiceChangeParameter]
([ReferenceTransactionId]
,[ServiceId]
,[ParamCode]
,[ParamValue]
,[ParamValidFrom])
SELECT
@ReferenceTransactionId AS [ReferenceTransactionId]
,[ServiceId]
,[ParamCode]
,[ParamValue]
,[ParamValidFrom]
FROM @Parameters AS [P]
END TRY
BEGIN CATCH
THROW
END CATCH
END
CREATE PROCEDURE [dbo].[spGetManageServicesRequest]
@ReferenceTransactionId INT
AS
BEGIN
SET NOCOUNT ON;
BEGIN TRY
/* VRAŤ MANAGE SERVICES REQUEST PODLE ID */
SELECT
[MR].[ReferenceTransactionId],
[MR].[OrderDate],
[MR].[QueuePriority],
[MR].[Queued],
[SC].[ReferenceTransactionId],
[SC].[ServiceId],
[SC].[ServiceStatus],
[SC].[ValidFrom],
[SP].[ReferenceTransactionId],
[SP].[ServiceId],
[SP].[ParamCode],
[SP].[ParamValue],
[SP].[ParamValidFrom]
FROM [dbo].[ManageServicesRequest] AS [MR]
LEFT JOIN [dbo].[ServiceChange] AS [SC] ON [SC].[ReferenceTransactionId] = [MR].[ReferenceTransactionId]
LEFT JOIN [dbo].[ServiceChangeParameter] AS [SP] ON [SP].[ReferenceTransactionId] = [SC].[ReferenceTransactionId] AND [SP].[ServiceId] = [SC].[ServiceId]
WHERE [MR].[ReferenceTransactionId] = @ReferenceTransactionId
END TRY
BEGIN CATCH
THROW
END CATCH
END
Now these are used this way (it's a simplified C# method that creates a record and then posts record to a micro service queue):
public async Task Consume(ConsumeContext<CreateCommand> context)
{
using (var sql = sqlFactory.Cip)
{
/*SAVE REQUEST TO DATABASE*/
sql.StartTransaction(System.Data.IsolationLevel.Serializable); <----- First transaction starts
/* Create id */
var transactionId = await GetNewId(context.Message.CorrelationId);
/* Create manage services request */
await sql.OrderingGateway.ManageServices.Create(transactionId, context.Message.ApiRequest.OrderDate, context.Message.ApiRequest.Priority, services);
sql.Commit(); <----- First transaction ends
/// .... Some other stuff ...
/* Fetch the same object you created in the first transaction */
Try
{
sql.StartTransaction(System.Data.IsolationLevel.Serializable);
var request = await sql.OrderingGateway.ManageServices.Get(transactionId); <----- HERE BE THE DEADLOCK,
request.Queued = DateTimeOffset.Now;
await sql.OrderingGateway.ManageServices.Update(request);
... Here is a posting to a microservice queue ...
sql.Commit();
}
catch (Exception)
{
sql.RollBack();
}
/// .... Some other stuff ....
}
Now my problem is. Why are these two procedures getting deadlocked? The first and the second transaction are never run in parallel for the same record.
Here is the deadlock detail:
<deadlock>
<victim-list>
<victimProcess id="process1dbfa86c4e8" />
</victim-list>
<process-list>
<process id="process1dbfa86c4e8" taskpriority="0" logused="0" waitresource="KEY: 18:72057594046775296 (b42d8e559092)" waittime="2503" ownerId="33411557480" transactionname="user_transaction" lasttranstarted="2021-12-01T01:06:15.303" XDES="0x1ddd2df4420" lockMode="RangeS-S" schedulerid="20" kpid="23000" status="suspended" spid="55" sbid="2" ecid="0" priority="0" trancount="1" lastbatchstarted="2021-12-01T01:06:15.310" lastbatchcompleted="2021-12-01T01:06:15.300" lastattention="1900-01-01T00:00:00.300" clientapp="Core Microsoft SqlClient Data Provider" hostpid="11020" isolationlevel="serializable (4)" xactid="33411557480" currentdb="18" currentdbname="xxx" lockTimeout="4294967295" clientoption1="673185824" clientoption2="128056">
<executionStack>
<frame procname="xxx.dbo.spGetManageServicesRequest" line="10" stmtstart="356" stmtend="4256" sqlhandle="0x030012001374fc02f91433019aad000001000000000000000000000000000000000000000000000000000000"></frame>
</executionStack>
</process>
<process id="process1dbfa1c1c28" taskpriority="0" logused="1232" waitresource="KEY: 18:72057594046971904 (ffffffffffff)" waittime="6275" ownerId="33411563398" transactionname="user_transaction" lasttranstarted="2021-12-01T01:06:16.450" XDES="0x3d4e842c420" lockMode="RangeI-N" schedulerid="31" kpid="36432" status="suspended" spid="419" sbid="2" ecid="0" priority="0" trancount="2" lastbatchstarted="2021-12-01T01:06:16.480" lastbatchcompleted="2021-12-01T01:06:16.463" lastattention="1900-01-01T00:00:00.463" clientapp="Core Microsoft SqlClient Data Provider" hostpid="11020" isolationlevel="serializable (4)" xactid="33411563398" currentdb="18" currentdbname="xxx" lockTimeout="4294967295" clientoption1="673185824" clientoption2="128056">
<executionStack>
<frame procname="xxx.dbo.spCreateManageServicesRequest" line="40" stmtstart="2592" stmtend="3226" sqlhandle="0x03001200f01ab84aeb1433019aad000001000000000000000000000000000000000000000000000000000000"></frame>
</executionStack>
</process>
</process-list>
<resource-list>
<keylock hobtid="72057594046775296" dbid="18" objectname="xxx.dbo.ServiceChange" indexname="PK_ServiceChange" id="lock202ecfd0380" mode="X" associatedObjectId="72057594046775296">
<owner-list>
<owner id="process1dbfa1c1c28" mode="X" />
</owner-list>
<waiter-list>
<waiter id="process1dbfa86c4e8" mode="RangeS-S" requestType="wait" />
</waiter-list>
</keylock>
<keylock hobtid="72057594046971904" dbid="18" objectname="xxx.dbo.ServiceChangeParameter" indexname="PK_ServiceChangeParameter" id="lock27d3d371880" mode="RangeS-S" associatedObjectId="72057594046971904">
<owner-list>
<owner id="process1dbfa86c4e8" mode="RangeS-S" />
</owner-list>
<waiter-list>
<waiter id="process1dbfa1c1c28" mode="RangeI-N" requestType="wait" />
</waiter-list>
</keylock>
</resource-list>
</deadlock>
Why is this deadlock happening? How do I avoid it in the future?
Edit: Here is a plan for Get procedure: https://www.brentozar.com/pastetheplan/?id=B1UMMhaqF
Another Edit: After GSerg comment, I changed the line number in the deadlock graph from 65 to 40, due to removed columns that are not important to the question.
You are better off avoiding serializable isolation level. The way the serializable guarantee is provided is often deadlock prone.
If you can't alter your stored procs to use more targeted locking hints that guarantee the results you require at a lesser isolation level then you can prevent this particular deadlock scenario shown by ensuring that all locks are taken out on ServiceChange
first before any are taken out on ServiceChangeParameter
.
One way of doing this would be to introduce a table variable in spGetManageServicesRequest
and materialize the results of
SELECT ...
FROM [dbo].[ManageServicesRequest] AS [MR]
LEFT JOIN [dbo].[ServiceChange] AS [SC] ON [SC].[ReferenceTransactionId] = [MR].[ReferenceTransactionId]
to the table variable.
Then join that against [dbo].[ServiceChangeParameter]
to get your final results.
The phase separation introduced by the table variable will ensure the SELECT
statement acquires the locks in the same object order as the insert is doing so prevent deadlocks where the SELECT
statement already holds a lock on ServiceChangeParameter
and is waiting to acquire one on ServiceChange
(as in the deadlock graph here).
It may be instructive to look at the exact locks taken out by the SELECT
running at serializable isolation level. These can be seen with extended events or undocumented trace flag 1200.
Currently your execution plan is below.
For the following example data
INSERT INTO [dbo].[ManageServicesRequest]
VALUES (26410821, GETDATE(), 1, GETDATE()),
(26410822, GETDATE(), 1, GETDATE()),
(26410823, GETDATE(), 1, GETDATE());
INSERT INTO [dbo].[ServiceChange]
VALUES (26410821, 'X', 'X', GETDATE()),
(26410822, 'X', 'X', GETDATE()),
(26410823, 'X', 'X', GETDATE());
INSERT INTO [dbo].[ServiceChangeParameter]
VALUES (26410821, 'X', 'P1','P1', GETDATE()),
(26410823, 'X', 'P1','P1', GETDATE());
The trace flag output (for WHERE [MR].[ReferenceTransactionId] = 26410822
) is
Process 51 acquiring IS lock on OBJECT: 7:1557580587:0 (class bit2000000 ref1) result: OK
Process 51 acquiring IS lock on OBJECT: 7:1509580416:0 (class bit2000000 ref1) result: OK
Process 51 acquiring IS lock on OBJECT: 7:1477580302:0 (class bit2000000 ref1) result: OK
Process 51 acquiring IS lock on PAGE: 7:1:600 (class bit2000000 ref0) result: OK
Process 51 acquiring S lock on KEY: 7:72057594044940288 (1b148afa48fb) (class bit2000000 ref0) result: OK
Process 51 acquiring IS lock on PAGE: 7:1:608 (class bit2000000 ref0) result: OK
Process 51 acquiring RangeS-S lock on KEY: 7:72057594045005824 (a69d56b089b6) (class bit2000000 ref0) result: OK
Process 51 acquiring IS lock on PAGE: 7:1:632 (class bit2000000 ref0) result: OK
Process 51 acquiring RangeS-S lock on KEY: 7:72057594045202432 (c37d1982c3c9) (class bit2000000 ref0) result: OK
Process 51 acquiring RangeS-S lock on KEY: 7:72057594045005824 (2ef5265f2b42) (class bit2000000 ref0) result: OK
The order of locks taken is indicated in the image below. Range locks apply to the range of possible values from the given key value, to the nearest key value below it (in key order - so above it in the image!).
First node 1 is called and it takes an S
lock on the row in ManageServicesRequest
, then node 2 is called and a RangeS-S
lock is taken on a key in ServiceChange
the values from this row are then used to do the lookup in ServiceChangeParameter
- in this case there are no matching rows for the predicate but a RangeS-S
lock is still taken out covering the range from the next highest key to the preceding one (range (26410821, 'X', 'P1') ... (26410823, 'X', 'P1')
in this case).
Then node 2 is called again to see if there are any more rows. Even in the case that there aren't an additional RangeS-S
lock is taken on the next row in ServiceChange
.
In the case of your deadlock graph it seems that the range being locked in ServiceChangeParameter
is the range to infinity (denoted by ffffffffffff
) - this will happen here when it does a look up for a key value at or beyond the last key in the index.
An alternative to the table variable might also be to change the query as below.
SELECT ...
FROM [dbo].[ManageServicesRequest] AS [MR]
LEFT JOIN [dbo].[ServiceChange] AS [SC] ON [SC].[ReferenceTransactionId] = [MR].[ReferenceTransactionId]
LEFT HASH JOIN [dbo].[ServiceChangeParameter] AS [SP] ON [SP].[ReferenceTransactionId] = [MR].[ReferenceTransactionId] AND [SP].[ServiceId] = [SC].[ServiceId]
WHERE [MR].[ReferenceTransactionId] = @ReferenceTransactionId
The final predicate on [dbo].[ServiceChangeParameter] is changed to reference [MR].[ReferenceTransactionId]
instead of [SC].[ReferenceTransactionId]
and an explicit hash join hint is added.
This gives a plan like the below where all the locks on ServiceChange
are taken during the hash table build stage before any are taken on ServiceChangeParameter
- without changing the ReferenceTransactionId
condition the new plan had a scan rather than a seek on ServiceChangeParameter
which is why that change was made (it allows the optimiser to use the implied equality predicate on @ReferenceTransactionId)