sqlsql-serverquery-optimization

Query with aggregate functions is very slow


I have the following query:

WITH RankedCallNotes AS (
    SELECT
        Id,
        CreatedOn,
        CandidateId,
        Text,
        ROW_NUMBER() OVER (PARTITION BY CandidateId ORDER BY CreatedOn DESC) AS rn
    FROM
        [dbo].[CandidateCallNotes]
)

SELECT 
    c.Id,
    c.FirstName,
    c.LastName,
    c.AvailableDate,
    c.Email,
    cm0.Modality AS PrimaryModality,
    cm1.Modality AS Modality2,
    c.Phone,
    s.State AS HomeState,
    STRING_AGG(cert.CertName, ', ') AS Certificates,
    c.ActiveStateLicenseId,
    r.Name AS Rep,
    n.CreatedOn AS MostRecentCallTsp,
    n.Text AS MostRecentCall
FROM [dbo].[Candidates] c
    LEFT JOIN 
        (
            SELECT cm.CandidateId,
                cm.ModalityId,
                m.Name AS Modality,
                cm."Order" 
            FROM [dbo].[CandidateModalities] cm
            JOIN [dbo].[Modalities] m ON m.Id = cm.ModalityId
            WHERE "Order" = 0
        ) AS cm0 ON c.Id = cm0.CandidateId
    LEFT JOIN 
        (
            SELECT cm.CandidateId,
                cm.ModalityId,
                m.Name AS Modality,
                cm."Order" 
            FROM [dbo].[CandidateModalities] cm
            JOIN [dbo].[Modalities] m ON m.Id = cm.ModalityId
            WHERE "Order" = 1
        ) AS cm1 ON c.Id = cm1.CandidateId
    LEFT JOIN
        [dbo].[States] s ON c.HomeStateId = s.Id
    LEFT JOIN
        [dbo].[CandidateModalities] cm ON cm.CandidateId = c.Id
    LEFT JOIN
        (
            SELECT CertName,
                CandidateModalityId 
            FROM [dbo].[CandidateCertifications]
            WHERE CertName IS NOT NULL
            AND CertName <> ''
        ) cert ON cm.Id = cert.CandidateModalityId
    LEFT JOIN
        [dbo].[Reps] r ON c.RepId = r.Id
    LEFT JOIN
        RankedCallNotes n ON n.CandidateId = c.Id
        AND n.rn = 1
GROUP BY
    c.Id,
    c.FirstName,
    c.LastName,
    c.AvailableDate,
    c.Email,
    cm0.Modality,
    cm1.Modality,
    c.Phone,
    s."State",
    c.ActiveStateLicenseId,
    r.Name,
    n.CreatedOn,
    n.Text

It takes around 40 seconds to run so I'm trying to figure a way to optimize it if I can. For context, the Candidates table is around 40,000 rows, the CandidateModalities table is around 40,000 rows, and the CandidateCallNotes table is around 200,000 rows.

Here's the data plan: https://www.brentozar.com/pastetheplan/?id=0ctqpalAIr

What I've noticed:

  1. There's a Filter operation that the plan says is taking 36983 ms to complete which seems really problematic.
  2. There's a Sort operation that has a warning attached reading Operator used tempdb to spill data during execution with spill level {0} and {1} spilled thread(s) and its actual elapsed time is 36907 ms.
  3. If I remove portions requiring aggregate functions (either for Certificates or for CandidateCallNotes), the execution time improves to <1 second so I assume my issues primarily lie there.

I'm extremely new to query optimization and am not sure exactly what I should be looking for within this plan, how to interpret that, and then what to do to address the issue. I assume indexes would help but I have Azure's automatic indexing in place and want to make sure I don't screw up what it's already doing.


Solution

  • Lots to chew on here. Primarily, your indexing is awful, but you can also improve the query:

    WITH RankedCallNotes AS (
        SELECT
            ccn.Id,
            ccn.CreatedOn,
            ccn.CandidateId,
            ccn.Text,
            ROW_NUMBER() OVER (PARTITION BY ccn.CandidateId ORDER BY ccn.CreatedOn DESC) AS rn
        FROM
            dbo.CandidateCallNotes ccn
    )
    
    SELECT 
        c.Id,
        c.FirstName,
        c.LastName,
        c.AvailableDate,
        c.Email,
        cm.PrimaryModality,
        cm.Modality2,
        c.Phone,
        s.State AS HomeState,
        cm.Certificates,
        c.ActiveStateLicenseId,
        r.Name AS Rep,
        n.CreatedOn AS MostRecentCallTsp,
        n.Text AS MostRecentCall
    FROM dbo.Candidates AS c
    LEFT JOIN (
        SELECT
            cm.CandidateId,
            MIN(CASE WHEN cm."Order" = 0 THEN cm.Name END) AS PrimaryModality,
            MIN(CASE WHEN cm."Order" = 1 THEN cm.Name END) AS Modality2,
            STRING_AGG(cert.CertName, ', ') AS Certificates
        FROM dbo.CandidateModalities cm
        JOIN dbo.Modalities m ON m.Id = cm.ModalityId
        -- what is the relationship here? Maybe worth adding another aggregation level
        LEFT JOIN dbo.CandidateCertifications cert
            ON cm.Id = cert.CandidateModalityId
            AND cert.CertName <> ''
        GROUP BY
            cm.CandidateId
    ) AS cm ON c.Id = cm.CandidateId
    LEFT JOIN
        dbo.States s ON c.HomeStateId = s.Id
    LEFT JOIN
        dbo.Reps r ON c.RepId = r.Id
    LEFT JOIN
        RankedCallNotes n ON n.CandidateId = c.Id
        AND n.rn = 1;
    

    Then you need better indexing.

    CandidateModalities (CandidateId, ModalityId) INCLUDE ("Order", Name)  -- could swap the key order as well
    
    CandidateCertifications (CandidateModalityId) INCLUDE (CertName) WHERE (CertName <> N'')
    
    CandidateCallNotes (CandidateId, CreatedOn DESC) INCLUDE (Text)
    
    OUTER APPLY (
        SELECT TOP (1)
            ccn.*
        FROM
            CandidateCallNotes ccn
        WHERE
            ccn.CandidateId = c.Id
        ORDER BY
            ccn.CreatedOn DESC
    ) AS n