I have a performance problem with a SQL query running on SQL Server 2019 that returns the lat/long from a geography column.
My query is as follows and it takes around 5 seconds to return 553 rows:
SELECT ActivityLocations.ID, ActivityLocations.ActivityID, ActivityLocations.Number, ActivityLocations.Location.Lat AS 'Latitude', ActivityLocations.Location.Long AS 'Longitude'
FROM Plans
INNER JOIN Activities ON Plans.ID = Activities.PlanID
INNER JOIN ActivityLocations ON Activities.ID = ActivityLocations.ActivityID
WHERE CustomerID = 35041
The query plan it generates is:
But if I change the query a little to return a bit less data, it takes 0 seconds to return 207 rows:
SELECT ActivityLocations.ID, ActivityLocations.ActivityID, ActivityLocations.Number, ActivityLocations.Location.Lat AS 'Latitude', ActivityLocations.Location.Long AS 'Longitude'
FROM Plans
INNER JOIN Activities ON Plans.ID = Activities.PlanID
INNER JOIN ActivityLocations ON Activities.ID = ActivityLocations.ActivityID
WHERE PlanID > 22486
I guess my question, is why is the Compute Scalar operation happening before the join on the slow query and after the join on the fast query? I don't understand why it would do the Lat/Long operation on every row the activity locations table when we only want a small subset of the rows?
Any help would be greatly appreciated.
Edited to contain table information
CREATE TABLE [dbo].[Activities](
[ID] [int] NOT NULL,
[PlanID] [int] NOT NULL,
[Name] [nvarchar](255) NOT NULL,
CONSTRAINT [PK_Activity] PRIMARY KEY CLUSTERED
(
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]
CREATE TABLE [dbo].[ActivityLocations](
[ID] [int] NOT NULL,
[ActivityID] [int] NOT NULL,
[Number] [int] NOT NULL,
[Location] [geography] NOT NULL,
CONSTRAINT [PK_ActivityLocations] PRIMARY KEY CLUSTERED
(
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
CREATE TABLE [dbo].[ActivityPlans](
[ID] [int] NOT NULL,
[CustomerID] [int] NOT NULL,
[PurchaseOrder] [nvarchar](255) NULL,
[Deleted] [bit] NOT NULL,
[Name] [nvarchar](500) NULL,
CONSTRAINT [PK_ActivityPlan] PRIMARY KEY CLUSTERED
(
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]
CREATE NONCLUSTERED INDEX [PlanID_IX] ON [dbo].[Activities]
(
[PlanID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
CREATE NONCLUSTERED INDEX [ActivityID_IX] ON [dbo].[ActivityLocations]
(
[ActivityID] ASC
)
INCLUDE([Number],[Location]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
CREATE NONCLUSTERED INDEX [CustomerID_NCIX] ON [dbo].[ActivityPlans]
(
[CustomerID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
NOTE: OP changed the name of the table Plans
to ActivityPlans
between the originally posted queries/graphical executions plans and his subsequently provided execution plan (via paste the plan) and DDL scripts. I am going to speak in terms of the table Plans
(as it was originally posted).
I am going to try to explain this in its entirety, including trying to note the things that are not an issue between these two queries because the obvious differences when looking at the two queries is not resulting in the issue (per se). Read on to see my explanation.
First, lets talk about where the two execution plans are similar. It is important to note the similarities because (assuming second query plan is acceptable for the OP) the issues is not with the similarities between the two query plans.
Plans
and Activities
.
WHERE
clause. Which is good. The optimizer makes a good decision and is able to use an index seek in both cases. The filter WHERE CustomerID = 35041
is resolved to an index seek on the index CustomerID_NCIX
of the Plans
table and the filter WHERE PlanID > 22486
is resolved to an index seek on the index PlanID_IX
of the Activities
table. Then the join to the subsequent table (Activities
in the first query and Plans
in the second query) is done. It is supported by an index in both cases, the estimates for the seek operations are not horrific, and both joins are done using Nested Loops that output a number that is relatively close to the final result set. So even though the only visual difference in these two queries is the difference in the WHERE
clause, it appears the WHERE
clause in each query is being handled fairly similarly and does not appear to be an issue.ActivityLocations
using the index ActivityID_IX
.Compute Scalar
operator that retrieves the required values for your expressions ActivityLocations.Location.Lat
and ActivityLocations.Location.Long
from ActivityLocations.Location
of your SELECT
statement.Now lets talk about the (important) differences, which is where the issues lie.
ActivityLocations
using an Index Seek operator, while the second query uses an Index Scan operator.ActivityLocations
of the first query has an Actual/Estimated Row Count of 329,475/331,523 and the Index Seek operator that accesses the table ActivityLocations
of the second query has an Actual/Estimated Row Count of 207/9.Plans
and Activities
) and the second query uses a Nested Loop Join.Put simply, the first query is reading much more data when we look at the execution plan. The rows read from the table ActivityLocations
of roughly 300k in the first query is much higher than the 207 rows read in the second query. Additionally, the Compute Scalar operator of the first query is required to compute values for (the same) roughly 300k rows as opposed to the 207 rows of the second query. This obviously leads to a longer running query.
It is also worth noting that the larger row counts coming from the table ActivityLocations
is the reason for the Merge Join (seen in the first query plan) in place of the Nested Loop Join operator (seen in the second query plan). According to the optmizer, given your environment, a Merge Join is more suitable to join 300k rows to 3.3k rows than a Nested Loop Join would be. And the use of a Merge Join requires both sides of the join to be sorted by the join columns, thus the reason for the additional Sort operator in the query plan of the first query.
Estimates. Estimates drive the optimizer's decision making. In the first query we see that the estimated number of rows to read from the table ActivityLocations
(from the Index Scan) is 331,523 and in the second query (from the Index Seek) we see an estimate of 9. It might seem odd to say, but these estimates are closer than you might think. An Index Scan (on up to date statistics) is usually going to have row estimates equivalent to the rows in the table (except for filtered indexes). An Index Seek would ideally estimate a lower number of rows than what is contained in the table. Ideally that number would match the actual number of rows that the Index Seek is required to touch, but the fact that your Index Seek estimate is lower than the entire table is a step in the right direction.
So if the issue is not with the estimates in the Index Scan or Index Seek then where is it? The issue is in the choice to access the table ActivityLocations
using an Index Scan in the first query over the choice to use an Index Seek. So why does the first query choose an Index Scan? It is quite clear by looking at the execution plan that an Index Seek would have been a better option. I believe the answer in this case is the cardinality estimation, specifically in this case, the cardinality estimation used for the join to the table ActivityLocations
.
We see that the estimated number of rows in the output of the final join of the first query increases from its input Sort operator (3341->3402) while the estimated number of rows in the output of the final join of the second query remains consistent from its input Nested Loop operator (207->207). And not only did the optimzer estimate this, it was right. The actual rows counts returned from these same operators reflect the same pattern.
Why does that matter? What it means is that based on the optimizer's estimate, the join to the table ActivityLocations
is going to increase the row count of the input result set. Meaning this join is going to be 1 (input row) to many (output row). Keep in mind, the optimizer needs to return your requested values ActivityLocations.Location.Lat
and ActivityLocations.Location.Long
from the table ActivityLocations
. So when it considers this join, that it believes is going to increase the rows it plans to output from accessing the table ActivityLocations
while keeping in mind that it needs to perform a Compute Scalar on columns output from that table, it would make sense to run the Compute Scalar prior to running the join because if the Compute Scalar is run before the join it can guarantee that the Compute Scalar is only running once per row of ActivityLocations
, but it cannot guarantee that if the Compute Scalar is run after the join. In this case, the join is actually what ends up limiting the rows from AcitivityLocations
and the number of rows returned from that table (for the purposes of this query) is much lower than the row count of the table. In the second query, the estimate says that the output number of rows will be the same, so running the Compute Scalar after the join makes no difference to the number of rows where the compute will be required, so it makes sense to perform the Index Seek in place of the Index Scan.
So to summarize, the rows returned from the first two tables of your first and second query (using your WHERE
clause) are different. And it is likely that the rows returned from the first query resulted in a join estimation that estimated a different cardinality than the second query. Thus the differences in how the query plan was built and how it was subsequently run.
Cardinality estimation (with joins specifically) is made up of several factors. If you really want to get in to the internals I would recommend these two articles from the legend Paul White and SQL Shack. The things discussed there should guide you on how you can review the estimation within your system.
The first goal would be to improve the estimate. If the estimated cardinality for the join was incorrect (which is actually not the case here) then updating the statistics might help. Out of date statistics could lead to a bad cardinality estimate, and thus a bad query plan.
In some cases you might be able to restructure your query to a logical equivalent to result in a better execution plan. This might be by writing it to produce better estimates or even possibly returning different rows in different orders. In this case, I would say the first query looks fine to me, and thus rewriting the query for a logical equivalent would likely not help.
In this case, the cardinality estimate is correct (and you mentioned that you updated statistics and it did not help), the query appears to be written in a good fashion, but the execution plan chosen is still sub-optmial. So I would recommend a query hint. This problem would easily be solved with a query hint to seek the index ActivityID_IX
of ActivityLocations
. Your join in your first query would look like so:
INNER JOIN ActivityLocations (WITH FORCESEEK,INDEX(ActivityID_IX)) ON Activities.ID = ActivityLocations.ActivityID
There is plenty of information out there on why query hints might be a bad idea, but given the information I have here I would say that is the best option. I am always open to other opinion. Cheers!