performancet-sqlinsertcorrelated-subqueryquery-tuning

SQL Server Performance tuning of While loop with correlated sub query


In my project I came across a challenge with below T-SQL code.

  1. step1 populates the UserModules table with parent modules and its subscribed users
  2. step2 checks for child modules associated to modules in step1 in Modules_Hierarchy table and inserts valid records into UserModules tables by mapping child modules with parent modules subscribed users. This step would repeats recursively until all child modules found.

Problem:

In step2, WHILE loop and SELECT statement uses correlated subquery and also the table UserModules is part of both INSERT and associated SELECT Clause which is hampering the performance and frequently the query failing with below LOCK escalation issue.

The final data size in ModulesUsers table is 42 million and its expected to grow.

Error Message: “The instance of the SQL Server Database Engine cannot obtain a LOCK resource at this time. Rerun your statement when there are fewer active users. Ask the database administrator to check the lock and memory configuration for this instance, or to check for long-running transactions.”

How to optimize this query i.e. step2 to resolve the issue?

Step1:

INSERT INTO UserModules(ModuleID, UserID)
  SELECT ModuleID, UserID
  FROM TABLEA a
  INNER JOIN TABLEB b ON a.ID = b.ID

Step2:

DECLARE @cnt int
SET @cnt = 1

WHILE( @cnt > 0 )      
BEGIN      

  SET @cnt = (SELECT COUNT(DISTINCT s.moduleid)
              FROM Modules_Hirarchy s WITH (nolock), Modules t      
              WHERE s.ParentModuleId = t.ModuleId      
              ------------      
                AND NOT EXISTS       
                 (SELECT ModuleId + EndUserId 
                  FROM UserModules  r      
                  WHERE s.moduleid = r.moduleid 
                    AND t.EndUserId = r.EndUserId)
                AND s.moduleid + t.EndUserId NOT IN 
                  (SELECT CAST(ModuleId AS varchar) + EndUserId 
                   FROM UserModules ))      

  IF @cnt = 0      
    BREAK      

  INSERT INTO UserModules (ModuleId, EndUserId)      
    SELECT DISTINCT s.moduleid, t.EndUserId       
    FROM Modules_Hirarchy s WITH (nolock), UserModules  t      
    WHERE s.ParentModuleId = t.ModuleId      
      AND NOT EXISTS       
       (SELECT ModuleId + EndUserId 
        FROM UserModules  r      
        WHERE s.moduleid = r.moduleid 
          AND t.EndUserId = r.EndUserId)

END  

Solution

  • some data to play with

    create table #UserModules(ModuleID int, UserID int)
    
    create table #Modules_Hirarchy(ParentModuleID int, ChildModuleID int)
    
    insert into #UserModules (ModuleID , UserID)
    values(1,1)
    ,(2,1)
    ,(3,1)
    ,(4,1)
    ,(5,1)
    ,(6,2)
    ,(7,2)
    
    insert into #Modules_Hirarchy(ParentModuleID , ChildModuleID )
    values (null,1)
    ,(1,2)
    ,(2,3)
    ,(3,4)
    ,(3,5)
    ,(null,6)
    ,(6,7)
    

    resolution

    with cts(ModuleID, UserID,parentModule ) as 
    (
    select a.ModuleID, a.UserID , CAST(null as int)as parentModule --, cAST(null as int)as b
    from #UserModules a join #Modules_Hirarchy  b on a.ModuleID = b.ChildModuleID 
    where b.ParentModuleID is null
    
    union all
    
    select b.ChildModuleID as ModuleID, a.UserID, b.ParentModuleID
    from cts a join #Modules_Hirarchy b 
    on a.ModuleID = b.ParentModuleID
    
    )
    select *
    into #RESULT
    from cts
    

    edit its hard to say : ) to many variables but things you should do to make query efficient

    1. separate non clustered indexes on columns ModuleID ParentModuleID ChildModuleID

    2. you probably dont want to query for all of the groups but only for a explicit ones filter out as many groups as posible in anchor statement

      select a.ModuleID, a.UserID , CAST(null as int)as parentModule from #UserModules a join #Modules_Hirarchy b on a.ModuleID = b.ChildModuleID where b.ParentModuleID is null and a.ModuleId in (listOfModules)

    3. add unique index for columns (ParentModuleID, ChildModuleID) as non unique rows there may lead to huge amount of row duplication

    Except on that it depends on data selectivity on the ParentModuleID ChildModuleID, but you cant do much about it

    i think it will work fine for big data sets as predicates are simple and as long as data selectivity is high