In my project I came across a challenge with below T-SQL code.
Problem:
In step2, WHILE loop and SELECT statement uses correlated subquery and also the table UserModules is part of both INSERT and associated SELECT Clause which is hampering the performance and frequently the query failing with below LOCK escalation issue.
The final data size in ModulesUsers table is 42 million and its expected to grow.
Error Message: “The instance of the SQL Server Database Engine cannot obtain a LOCK resource at this time. Rerun your statement when there are fewer active users. Ask the database administrator to check the lock and memory configuration for this instance, or to check for long-running transactions.”
How to optimize this query i.e. step2 to resolve the issue?
Step1:
INSERT INTO UserModules(ModuleID, UserID)
SELECT ModuleID, UserID
FROM TABLEA a
INNER JOIN TABLEB b ON a.ID = b.ID
Step2:
DECLARE @cnt int
SET @cnt = 1
WHILE( @cnt > 0 )
BEGIN
SET @cnt = (SELECT COUNT(DISTINCT s.moduleid)
FROM Modules_Hirarchy s WITH (nolock), Modules t
WHERE s.ParentModuleId = t.ModuleId
------------
AND NOT EXISTS
(SELECT ModuleId + EndUserId
FROM UserModules r
WHERE s.moduleid = r.moduleid
AND t.EndUserId = r.EndUserId)
AND s.moduleid + t.EndUserId NOT IN
(SELECT CAST(ModuleId AS varchar) + EndUserId
FROM UserModules ))
IF @cnt = 0
BREAK
INSERT INTO UserModules (ModuleId, EndUserId)
SELECT DISTINCT s.moduleid, t.EndUserId
FROM Modules_Hirarchy s WITH (nolock), UserModules t
WHERE s.ParentModuleId = t.ModuleId
AND NOT EXISTS
(SELECT ModuleId + EndUserId
FROM UserModules r
WHERE s.moduleid = r.moduleid
AND t.EndUserId = r.EndUserId)
END
some data to play with
create table #UserModules(ModuleID int, UserID int)
create table #Modules_Hirarchy(ParentModuleID int, ChildModuleID int)
insert into #UserModules (ModuleID , UserID)
values(1,1)
,(2,1)
,(3,1)
,(4,1)
,(5,1)
,(6,2)
,(7,2)
insert into #Modules_Hirarchy(ParentModuleID , ChildModuleID )
values (null,1)
,(1,2)
,(2,3)
,(3,4)
,(3,5)
,(null,6)
,(6,7)
resolution
with cts(ModuleID, UserID,parentModule ) as
(
select a.ModuleID, a.UserID , CAST(null as int)as parentModule --, cAST(null as int)as b
from #UserModules a join #Modules_Hirarchy b on a.ModuleID = b.ChildModuleID
where b.ParentModuleID is null
union all
select b.ChildModuleID as ModuleID, a.UserID, b.ParentModuleID
from cts a join #Modules_Hirarchy b
on a.ModuleID = b.ParentModuleID
)
select *
into #RESULT
from cts
edit its hard to say : ) to many variables but things you should do to make query efficient
separate non clustered indexes on columns ModuleID ParentModuleID ChildModuleID
you probably dont want to query for all of the groups but only for a explicit ones filter out as many groups as posible in anchor statement
select a.ModuleID, a.UserID , CAST(null as int)as parentModule from #UserModules a join #Modules_Hirarchy b on a.ModuleID = b.ChildModuleID where b.ParentModuleID is null and a.ModuleId in (listOfModules)
add unique index for columns (ParentModuleID, ChildModuleID)
as non unique rows there may lead to huge amount of row duplication
Except on that it depends on data selectivity on the ParentModuleID ChildModuleID, but you cant do much about it
i think it will work fine for big data sets as predicates are simple and as long as data selectivity is high