sql-serverperformancequery-optimizationdatabase-performance

Performance issues with recursive SQL Server stored procedure for hierarchical data


I have a stored procedure GetCombinedRequestInfosByEmail that retrieves hierarchical data for employees. The procedure performs well for lower-level employees but takes around 55 seconds to execute for higher-level employees (e.g., CEO with 30,000 employees). For managers with fewer employees (10-7), it executes in milliseconds.

I am using another stored procedure GetRequestInfosByUser within this procedure, which dynamically constructs and executes a SQL query.

My current version of the stored procedure:

ALTER PROCEDURE GetCombinedRequestInfosByEmail
    @Email NVARCHAR(50)
AS
BEGIN
    SET NOCOUNT ON;

    CREATE TABLE #EmployeeHierarchy 
    (
        employee_id NVARCHAR(50) PRIMARY KEY,
        employee_email NVARCHAR(255),
        level INT
    );

    WITH EmployeeHierarchy AS 
    (
        SELECT 
            u.userid AS employee_id, 
            u.Email AS employee_email,
            u.manageruserid AS manager_id,
            0 AS level
        FROM 
            UserTable u
        WHERE 
            u.Email = @Email

        UNION ALL

        SELECT 
            u.userid AS employee_id, 
            u.Email AS employee_email,
            u.manageruserid AS manager_id,
            eh.level + 1
        FROM 
            UserTable u
        JOIN 
            EmployeeHierarchy eh ON u.manageruserid = eh.employee_id
    )
    INSERT INTO #EmployeeHierarchy (employee_id, employee_email, level)
        SELECT DISTINCT
            employee_id, employee_email, level
        FROM 
            EmployeeHierarchy 
        WHERE 
            level > 0
        ORDER BY 
            level, employee_id;

    CREATE TABLE #RequestInfodata 
    (
        RequestID INT,
        RequestCreatedByEmailID NVARCHAR(200),
        ApplicationName NVARCHAR(200),
        ITowner NVARCHAR(100),
        Businessowner NVARCHAR(100),
        RiskScore INT,
        RequestCreatedBy NVARCHAR(24)
    );

    DECLARE @employee_id NVARCHAR(50);
    DECLARE @employee_email NVARCHAR(255);

    DECLARE employee_cursor CURSOR FOR
        SELECT employee_id, employee_email
        FROM #EmployeeHierarchy;

    OPEN employee_cursor;

    FETCH NEXT FROM employee_cursor INTO @employee_id, @employee_email;

    WHILE @@FETCH_STATUS = 0
    BEGIN
        INSERT INTO #RequestInfodata
            EXEC GetRequestInfosByUser 
                    @UserId = @employee_id, 
                    @UserEmail = @employee_email,        
                    @IsBusinessOrItOwner = 1;

        FETCH NEXT FROM employee_cursor INTO @employee_id, @employee_email;
    END

    CLOSE employee_cursor;
    DEALLOCATE employee_cursor;
    
    SELECT DISTINCT * 
    FROM #RequestInfodata;

    -- Clean up temporary tables
    DROP TABLE #EmployeeHierarchy;
    DROP TABLE #RequestInfodata;
END;

Stored procedure GetRequestInfosByUser:

ALTER PROCEDURE [dbo].[GetRequestInfosByUser]
    @UserId VARCHAR(50) = NULL,
    @UserEmail VARCHAR(50) = NULL,
    @IsBusinessOrItOwner BIT = 0
AS
BEGIN
    SET NOCOUNT ON;

    -- Base SQL query
    DECLARE @SQL NVARCHAR(MAX) = '

    SELECT
        ri.RequestID,
        ri.RequestCreatedByEmailID,
        ri.NameOftheTool AS ApplicationName,
        ri.ITOwners AS ITowner,
        ri.BusinessOwners AS Businessowner,
        ri.RiskSummaryScore AS RiskScore,
        ri.RequestCreatedBy,
    FROM 
        RequestInfo ri
    LEFT JOIN 
        (SELECT 
             *,
             ROW_NUMBER() OVER (PARTITION BY RequestID ORDER BY DateModified DESC) AS RowNumber
         FROM 
             WorkflowTable) AS wf ON ri.RequestID = wf.RequestID 
                                  AND wf.RowNumber = 1
    WHERE 1 = 1 ';

    -- Append conditions to the WHERE clause based on input parameters
    IF @UserId IS NOT NULL AND @UserId <> 'ALL' AND @IsBusinessOrItOwner = 0
    BEGIN
        SET @SQL = @SQL + ' AND ri.RequestCreatedBy = @UserId';
    END
    ELSE IF @UserId IS NOT NULL AND @UserEmail IS NOT NULL AND @IsBusinessOrItOwner = 1
    BEGIN
        SET @SQL = @SQL + ' AND (ri.RequestCreatedBy = @UserId OR ri.ITOwners = @UserEmail OR   
ri.BusinessOwners = @UserEmail)';
    END

    -- Execute the SQL query
    EXEC sp_executesql @SQL, N'@UserId VARCHAR(50), @UserEmail VARCHAR(50)', @UserId, @UserEmail;
END;

For the CEO (with 30,000 employees), the GetCombinedRequestInfosByEmail stored procedure takes around 55 seconds, whereas for managers with 10 or 7 employees, it executes in milliseconds. I am looking to optimize this stored procedure for better performance, particularly for high-level employees with large hierarchies.

User table:

UserID Email ManagerUserID
1 ceo@gmail.com NULL
2 mgr1@gmail.com 1
3 mgr2@gmail.com 1
4 emp1@gmail.com 2
5 emp2@gmail.com 2
6 emp3@gmail.com 3

RequestInfo:

RequestID RequestCreatedBy ITOwners BusinessOwners NameOfModule RiskScore RequestSubmitted
1001 4 mgr1@gmail.com mgr1@gmail.com Module1 5 2024-05-01 08:00:00.000
1002 5 mgr1@gmail.com mgr1@gmail.com Module2 3 2024-05-02 08:00:00.000
1003 6 mgr2@gmail.com mgr2@gmail.com Module3 4 2024-05-03 08:00:00.000

When executing the GetCombinedRequestInfosByEmail stored procedure for ceo@gmail.com, I expect to get the combined request information for all employees under the CEO (including managers and their subordinates).

Here is the expected output:

RequestID RequestCreatedByEmailID ApplicationName ITowner Businessowner RiskScore RequestCreatedBy
1001 emp1@gmail.com Module1 mgr1@gmail.com mgr1@gmail.com 5 4
1002 emp2@gmail.com Module2 mgr1@gmail.com mgr1@gmail.com 3 5
1003 emp3@gmail.com Module3 mgr2@gmail.com mgr2@gmail.com 4 6

The main issue is that for the CEO, who has 30,000 employees, the execution time is around 55 seconds, while for managers with fewer employees (10-7), it executes in milliseconds.

How can I optimize the GetCombinedRequestInfosByEmail stored procedure to reduce the execution time for high-level employees (e.g., CEO with 30,000 employees)?


Solution

  • You could improve your current setup by changing GetRequestInfosByUser into an inline Table Valued Function, and then CROSS APPLY it back to your original query. You would also need an UPDATE STATISTICS on the temp table to get a recompilation for the correct number of rows (for a large number you probably want a hash or merge join).

    But it's probably better to just start afresh with a single query that can get all the data.

    WITH EmployeeHierarchy AS (
            SELECT 
                u.userid AS employee_id, 
                0 AS level
            FROM UserTable u
            WHERE u.Email = @Email
            UNION ALL
            SELECT 
                u.userid AS employee_id, 
                eh.level + 1
            FROM UserTable u
            JOIN EmployeeHierarchy eh ON u.manageruserid = eh.employee_id
        ),
    RelevantEmployees AS (
        SELECT DISTINCT
                employee_id
        FROM EmployeeHierarchy 
        WHERE level > 0
    )
    
        SELECT
               ri.RequestID,
               ri.RequestCreatedByEmailID,
               ri.NameOftheTool AS ApplicationName,
               ri.ITOwners AS ITowner,
               ri.BusinessOwners AS Businessowner,
               ri.RiskSummaryScore AS RiskScore,
               ri.RequestCreatedBy,
               CASE
                   WHEN wf.CurrentStatus = 'Started' AND wf.NextActionBy = 'SubmitterUser' AND        wf.Category = 'Workflow' AND ri.RequestID IS NOT NULL THEN 'Draft'
                   WHEN wf.CurrentStatus = 'Started' AND wf.NextActionBy = 'PeerUser' AND wf.Category = 'Workflow' AND ri.RequestID IS NOT NULL THEN 'Pending for Peer reviewer approval'
                   WHEN wf.CurrentStatus = 'Started' AND wf.NextActionBy = 'ManagerUser' AND wf.Category = 'Workflow' AND ri.RequestID IS NOT NULL THEN 'Pending/Reprocess for Manager reviewer approval'
                   WHEN wf.CurrentStatus = 'Rejected' AND wf.NextActionBy = 'SubmitterUser' AND wf.Category = 'Workflow' AND ri.RequestID IS NOT NULL THEN 'Manager Rejected(needs more clarifications)'
                   WHEN wf.CurrentStatus = 'Started' AND wf.NextActionBy = 'SecurityUser' AND wf.Category = 'Workflow' AND ri.RequestID IS NOT NULL THEN 'Pending/Reprocess for Security reviewer approval'
                   WHEN wf.CurrentStatus = 'Rejected' AND wf.NextActionBy = 'SubmitterUser' AND wf.Category = 'WorkFlow' AND ri.RequestID IS NOT NULL AND wf.WhoFilledInCapacityOf = 'SecurityUser' THEN 'Security Team Rejected(needs more clarifications)'
                   WHEN wf.CurrentStatus = 'Started' AND wf.NextActionBy = 'RiskSummaryUser' AND wf.Category = 'Workflow' AND ri.RequestID IS NOT NULL THEN 'Risk Mitigation plan Started'
                   WHEN wf.CurrentStatus = 'Started' AND wf.NextActionBy = 'SecurityUser' AND wf.Category = 'Exception' AND ri.RequestID IS NOT NULL THEN 'Exception Submitted'
                   WHEN wf.CurrentStatus = 'Approved' AND wf.NextActionBy = 'SecurityUser' AND wf.Category = 'Exception' AND ri.RequestID IS NOT NULL THEN 'Exception Approved'
                   WHEN wf.CurrentStatus = 'Completed' AND wf.NextActionBy = 'SecurityUser' AND wf.Category = 'Exception' AND ri.RequestID IS NOT NULL THEN 'Exception Closed'
                   WHEN wf.CurrentStatus = 'Completed' AND wf.NextActionBy = 'RiskSummaryUser' AND wf.Category = 'WorkFlow' AND ri.RequestID IS NOT NULL THEN 'Risk Completed(No Exception)'
                   ELSE null
               END AS CurrentStage
        FROM RequestInfo ri
        LEFT JOIN (
            SELECT *,
                   ROW_NUMBER() OVER (PARTITION BY RequestID ORDER BY DateModified DESC) AS RowNumber
            FROM WorkflowTable
        ) AS wf ON ri.RequestID = wf.RequestID AND wf.RowNumber = 1
        WHERE
          ri.RequestCreatedBy IN (SELECT employee_id FROM RelevantEmployees)      
    ;