sqlsql-serverquery-optimizationsqlgeography

Slow performance of Latitude/Longitude conversion in query


I have a performance problem with a SQL query running on SQL Server 2019 that returns the lat/long from a geography column.
My query is as follows and it takes around 5 seconds to return 553 rows:

SELECT ActivityLocations.ID, ActivityLocations.ActivityID, ActivityLocations.Number, ActivityLocations.Location.Lat AS 'Latitude', ActivityLocations.Location.Long AS 'Longitude'
FROM Plans
INNER JOIN Activities ON Plans.ID = Activities.PlanID
INNER JOIN ActivityLocations ON Activities.ID = ActivityLocations.ActivityID
WHERE CustomerID = 35041

The query plan it generates is: Slow Query Plan

But if I change the query a little to return a bit less data, it takes 0 seconds to return 207 rows:

SELECT ActivityLocations.ID, ActivityLocations.ActivityID, ActivityLocations.Number, ActivityLocations.Location.Lat AS 'Latitude', ActivityLocations.Location.Long AS 'Longitude'
FROM Plans
INNER JOIN Activities ON Plans.ID = Activities.PlanID
INNER JOIN ActivityLocations ON Activities.ID = ActivityLocations.ActivityID
WHERE PlanID > 22486

And the query plan is: Fast query Plan

I guess my question, is why is the Compute Scalar operation happening before the join on the slow query and after the join on the fast query? I don't understand why it would do the Lat/Long operation on every row the activity locations table when we only want a small subset of the rows?

Any help would be greatly appreciated.

Edited to contain table information

CREATE TABLE [dbo].[Activities](
[ID] [int] NOT NULL,
[PlanID] [int] NOT NULL,
[Name] [nvarchar](255) NOT NULL,
CONSTRAINT [PK_Activity] PRIMARY KEY CLUSTERED 
(
    [ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]

CREATE TABLE [dbo].[ActivityLocations](
    [ID] [int] NOT NULL,
    [ActivityID] [int] NOT NULL,
    [Number] [int] NOT NULL,
    [Location] [geography] NOT NULL,
 CONSTRAINT [PK_ActivityLocations] PRIMARY KEY CLUSTERED 
(
    [ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]

CREATE TABLE [dbo].[ActivityPlans](
    [ID] [int] NOT NULL,
    [CustomerID] [int] NOT NULL,
    [PurchaseOrder] [nvarchar](255) NULL,
    [Deleted] [bit] NOT NULL,
    [Name] [nvarchar](500) NULL,
 CONSTRAINT [PK_ActivityPlan] PRIMARY KEY CLUSTERED 
(
    [ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]

CREATE NONCLUSTERED INDEX [PlanID_IX] ON [dbo].[Activities]
(
    [PlanID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]

CREATE NONCLUSTERED INDEX [ActivityID_IX] ON [dbo].[ActivityLocations]
(
    [ActivityID] ASC
)
INCLUDE([Number],[Location]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]

CREATE NONCLUSTERED INDEX [CustomerID_NCIX] ON [dbo].[ActivityPlans]
(
    [CustomerID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]

Solution

  • NOTE: OP changed the name of the table Plans to ActivityPlans between the originally posted queries/graphical executions plans and his subsequently provided execution plan (via paste the plan) and DDL scripts. I am going to speak in terms of the table Plans (as it was originally posted).

    I am going to try to explain this in its entirety, including trying to note the things that are not an issue between these two queries because the obvious differences when looking at the two queries is not resulting in the issue (per se). Read on to see my explanation.

    Similarities

    First, lets talk about where the two execution plans are similar. It is important to note the similarities because (assuming second query plan is acceptable for the OP) the issues is not with the similarities between the two query plans.

    Differences

    Now lets talk about the (important) differences, which is where the issues lie.

    What is the actual problem?

    Put simply, the first query is reading much more data when we look at the execution plan. The rows read from the table ActivityLocations of roughly 300k in the first query is much higher than the 207 rows read in the second query. Additionally, the Compute Scalar operator of the first query is required to compute values for (the same) roughly 300k rows as opposed to the 207 rows of the second query. This obviously leads to a longer running query.

    It is also worth noting that the larger row counts coming from the table ActivityLocations is the reason for the Merge Join (seen in the first query plan) in place of the Nested Loop Join operator (seen in the second query plan). According to the optmizer, given your environment, a Merge Join is more suitable to join 300k rows to 3.3k rows than a Nested Loop Join would be. And the use of a Merge Join requires both sides of the join to be sorted by the join columns, thus the reason for the additional Sort operator in the query plan of the first query.

    Why is it happening?

    Estimates. Estimates drive the optimizer's decision making. In the first query we see that the estimated number of rows to read from the table ActivityLocations (from the Index Scan) is 331,523 and in the second query (from the Index Seek) we see an estimate of 9. It might seem odd to say, but these estimates are closer than you might think. An Index Scan (on up to date statistics) is usually going to have row estimates equivalent to the rows in the table (except for filtered indexes). An Index Seek would ideally estimate a lower number of rows than what is contained in the table. Ideally that number would match the actual number of rows that the Index Seek is required to touch, but the fact that your Index Seek estimate is lower than the entire table is a step in the right direction.

    So if the issue is not with the estimates in the Index Scan or Index Seek then where is it? The issue is in the choice to access the table ActivityLocations using an Index Scan in the first query over the choice to use an Index Seek. So why does the first query choose an Index Scan? It is quite clear by looking at the execution plan that an Index Seek would have been a better option. I believe the answer in this case is the cardinality estimation, specifically in this case, the cardinality estimation used for the join to the table ActivityLocations.

    We see that the estimated number of rows in the output of the final join of the first query increases from its input Sort operator (3341->3402) while the estimated number of rows in the output of the final join of the second query remains consistent from its input Nested Loop operator (207->207). And not only did the optimzer estimate this, it was right. The actual rows counts returned from these same operators reflect the same pattern.

    Why does that matter? What it means is that based on the optimizer's estimate, the join to the table ActivityLocations is going to increase the row count of the input result set. Meaning this join is going to be 1 (input row) to many (output row). Keep in mind, the optimizer needs to return your requested values ActivityLocations.Location.Lat and ActivityLocations.Location.Long from the table ActivityLocations. So when it considers this join, that it believes is going to increase the rows it plans to output from accessing the table ActivityLocations while keeping in mind that it needs to perform a Compute Scalar on columns output from that table, it would make sense to run the Compute Scalar prior to running the join because if the Compute Scalar is run before the join it can guarantee that the Compute Scalar is only running once per row of ActivityLocations, but it cannot guarantee that if the Compute Scalar is run after the join. In this case, the join is actually what ends up limiting the rows from AcitivityLocations and the number of rows returned from that table (for the purposes of this query) is much lower than the row count of the table. In the second query, the estimate says that the output number of rows will be the same, so running the Compute Scalar after the join makes no difference to the number of rows where the compute will be required, so it makes sense to perform the Index Seek in place of the Index Scan.

    So to summarize, the rows returned from the first two tables of your first and second query (using your WHERE clause) are different. And it is likely that the rows returned from the first query resulted in a join estimation that estimated a different cardinality than the second query. Thus the differences in how the query plan was built and how it was subsequently run.

    Cardinality estimation (with joins specifically) is made up of several factors. If you really want to get in to the internals I would recommend these two articles from the legend Paul White and SQL Shack. The things discussed there should guide you on how you can review the estimation within your system.

    How to fix it?

    The first goal would be to improve the estimate. If the estimated cardinality for the join was incorrect (which is actually not the case here) then updating the statistics might help. Out of date statistics could lead to a bad cardinality estimate, and thus a bad query plan.

    In some cases you might be able to restructure your query to a logical equivalent to result in a better execution plan. This might be by writing it to produce better estimates or even possibly returning different rows in different orders. In this case, I would say the first query looks fine to me, and thus rewriting the query for a logical equivalent would likely not help.

    In this case, the cardinality estimate is correct (and you mentioned that you updated statistics and it did not help), the query appears to be written in a good fashion, but the execution plan chosen is still sub-optmial. So I would recommend a query hint. This problem would easily be solved with a query hint to seek the index ActivityID_IX of ActivityLocations. Your join in your first query would look like so:

    INNER JOIN ActivityLocations (WITH FORCESEEK,INDEX(ActivityID_IX)) ON Activities.ID = ActivityLocations.ActivityID
    

    There is plenty of information out there on why query hints might be a bad idea, but given the information I have here I would say that is the best option. I am always open to other opinion. Cheers!