I have the following query:
WITH RankedCallNotes AS (
SELECT
Id,
CreatedOn,
CandidateId,
Text,
ROW_NUMBER() OVER (PARTITION BY CandidateId ORDER BY CreatedOn DESC) AS rn
FROM
[dbo].[CandidateCallNotes]
)
SELECT
c.Id,
c.FirstName,
c.LastName,
c.AvailableDate,
c.Email,
cm0.Modality AS PrimaryModality,
cm1.Modality AS Modality2,
c.Phone,
s.State AS HomeState,
STRING_AGG(cert.CertName, ', ') AS Certificates,
c.ActiveStateLicenseId,
r.Name AS Rep,
n.CreatedOn AS MostRecentCallTsp,
n.Text AS MostRecentCall
FROM [dbo].[Candidates] c
LEFT JOIN
(
SELECT cm.CandidateId,
cm.ModalityId,
m.Name AS Modality,
cm."Order"
FROM [dbo].[CandidateModalities] cm
JOIN [dbo].[Modalities] m ON m.Id = cm.ModalityId
WHERE "Order" = 0
) AS cm0 ON c.Id = cm0.CandidateId
LEFT JOIN
(
SELECT cm.CandidateId,
cm.ModalityId,
m.Name AS Modality,
cm."Order"
FROM [dbo].[CandidateModalities] cm
JOIN [dbo].[Modalities] m ON m.Id = cm.ModalityId
WHERE "Order" = 1
) AS cm1 ON c.Id = cm1.CandidateId
LEFT JOIN
[dbo].[States] s ON c.HomeStateId = s.Id
LEFT JOIN
[dbo].[CandidateModalities] cm ON cm.CandidateId = c.Id
LEFT JOIN
(
SELECT CertName,
CandidateModalityId
FROM [dbo].[CandidateCertifications]
WHERE CertName IS NOT NULL
AND CertName <> ''
) cert ON cm.Id = cert.CandidateModalityId
LEFT JOIN
[dbo].[Reps] r ON c.RepId = r.Id
LEFT JOIN
RankedCallNotes n ON n.CandidateId = c.Id
AND n.rn = 1
GROUP BY
c.Id,
c.FirstName,
c.LastName,
c.AvailableDate,
c.Email,
cm0.Modality,
cm1.Modality,
c.Phone,
s."State",
c.ActiveStateLicenseId,
r.Name,
n.CreatedOn,
n.Text
It takes around 40 seconds to run so I'm trying to figure a way to optimize it if I can. For context, the Candidates table is around 40,000 rows, the CandidateModalities table is around 40,000 rows, and the CandidateCallNotes table is around 200,000 rows.
Here's the data plan: https://www.brentozar.com/pastetheplan/?id=0ctqpalAIr
What I've noticed:
Operator used tempdb to spill data during execution with spill level {0} and {1} spilled thread(s) and its actual elapsed time is 36907 ms.I'm extremely new to query optimization and am not sure exactly what I should be looking for within this plan, how to interpret that, and then what to do to address the issue. I assume indexes would help but I have Azure's automatic indexing in place and want to make sure I don't screw up what it's already doing.
Lots to chew on here. Primarily, your indexing is awful, but you can also improve the query:
Don't group on the outside, and instead push aggregation into subqueries.
Because when you group on the outside and you are not groupig by primary keys then it's going to force the server to do a separate aggregation step for the whole resultset.
Combine the two subqueries on CandidateModalities and its left join into one single subquery, pre-grouped as I suggested.
CertName IS NOT NULL is superfluous.
WITH RankedCallNotes AS (
SELECT
ccn.Id,
ccn.CreatedOn,
ccn.CandidateId,
ccn.Text,
ROW_NUMBER() OVER (PARTITION BY ccn.CandidateId ORDER BY ccn.CreatedOn DESC) AS rn
FROM
dbo.CandidateCallNotes ccn
)
SELECT
c.Id,
c.FirstName,
c.LastName,
c.AvailableDate,
c.Email,
cm.PrimaryModality,
cm.Modality2,
c.Phone,
s.State AS HomeState,
cm.Certificates,
c.ActiveStateLicenseId,
r.Name AS Rep,
n.CreatedOn AS MostRecentCallTsp,
n.Text AS MostRecentCall
FROM dbo.Candidates AS c
LEFT JOIN (
SELECT
cm.CandidateId,
MIN(CASE WHEN cm."Order" = 0 THEN cm.Name END) AS PrimaryModality,
MIN(CASE WHEN cm."Order" = 1 THEN cm.Name END) AS Modality2,
STRING_AGG(cert.CertName, ', ') AS Certificates
FROM dbo.CandidateModalities cm
JOIN dbo.Modalities m ON m.Id = cm.ModalityId
-- what is the relationship here? Maybe worth adding another aggregation level
LEFT JOIN dbo.CandidateCertifications cert
ON cm.Id = cert.CandidateModalityId
AND cert.CertName <> ''
GROUP BY
cm.CandidateId
) AS cm ON c.Id = cm.CandidateId
LEFT JOIN
dbo.States s ON c.HomeStateId = s.Id
LEFT JOIN
dbo.Reps r ON c.RepId = r.Id
LEFT JOIN
RankedCallNotes n ON n.CandidateId = c.Id
AND n.rn = 1;
Then you need better indexing.
CandidateModalities needs indexing over the two FK columns, although I'm not sure which column order is going to work out better.CandidateModalities (CandidateId, ModalityId) INCLUDE ("Order", Name) -- could swap the key order as well
CandidateCertifications has a blank CertName, you should therefore use a filtered index.CandidateCertifications (CandidateModalityId) INCLUDE (CertName) WHERE (CertName <> N'')
CandidateCallNotes should be indexed by the CreatedOn columnCandidateCallNotes (CandidateId, CreatedOn DESC) INCLUDE (Text)
OUTER APPLY for the RankedCallNotesOUTER APPLY (
SELECT TOP (1)
ccn.*
FROM
CandidateCallNotes ccn
WHERE
ccn.CandidateId = c.Id
ORDER BY
ccn.CreatedOn DESC
) AS n