I have the following data:
UserId | SubId | EduId | SemOrd |
---|---|---|---|
1 | 701 | 500 | 10 |
1 | 702 | 255 | 11 |
1 | 703 | 500 | 12 |
1 | 704 | 500 | 14 |
1 | 705 | 500 | 15 |
1 | 706 | 500 | 16 |
1 | 707 | 500 | 17 |
2 | ... | ... | .. |
Given a UserId
and SubId
, I want to find the smallest SemOrd
in the same EduId
chain.
For example: given UserId = 1
and SubId = 706
, the result should be SemOrd = 12
.
Since it implies a chain, I went for recursive CTE solution.
;WITH Base AS (
SELECT
UserId,
SubId,
EduId,
SemOrd,
ROW_NUMBER() OVER (PARTITION BY UserId ORDER BY SemOrd ASC) AS Ordinal,
ROW_NUMBER() OVER (PARTITION BY UserId, EduId ORDER BY SemOrd ASC) AS OrdinalParted
FROM /* ... */
), Chain AS (
SELECT
SemOrd AS SemOrd_First,
UserId,
SubId,
EduId,
SemOrd,
Ordinal,
OrdinalParted
FROM Base
UNION ALL
SELECT
Chain.SemOrd_First,
Base.UserId,
Base.SubId,
Base.EduId,
Base.SemOrd,
Base.Ordinal,
Base.OrdinalParted
FROM
Base
INNER JOIN Chain
ON Base.UserId = Chain.UserId
AND Base.EduId = Chain.EduId
AND Base.Ordinal = Chain.Ordinal + 1
AND Base.OrdinalParted = Chain.OrdinalParted + 1
), Minimal AS (
SELECT
UserId,
SubId,
MIN(SemOrd_First) AS SemOrd_First
FROM Chain
GROUP BY
UserId,
SubId
)
SELECT SemOrd_First FROM Minimal WHERE UserId = 1 AND SubId = 706
On the paper and on small tables, it works.
But on the production tables, it is basically unusable as it needs to compute first all CTE entries before doing the WHERE
filter. The query is way too slow.
Of course, one solution would be to move the WHERE
clause inside the 1st CTE:
;WITH Base AS (
SELECT
UserId,
SubId,
EduId,
SemOrd,
ROW_NUMBER() OVER (PARTITION BY UserId ORDER BY SemOrd ASC) AS Ordinal,
ROW_NUMBER() OVER (PARTITION BY UserId, EduId ORDER BY SemOrd ASC) AS OrdinalParted
FROM /* ... */
WHERE UserId = 1 AND SubId = 706
), Chain AS (
/* ... */
And then it is fast. But I would like to create a view out of this query.
Because of this, the WHERE
clause cannot be inside the 1st CTE, otherwise we cannot choose the UserId
and SubId
when calling the view.
The problem is that SQL Server is not able to bubble up the WHERE
clause inside the 1st CTE and reduce the amount of work in the execution plan.
So the questions are:
WHERE
clause?If none of the above could work, I guess my best bet would be a stored procedure instead.
You are overthinking this. This is a type of gaps-and-islands problem, and there are many simpler solutions.
You can do it in a single pass of the base table, by using LAG
to get the previous value (descending), and a windowed conditional COUNT
comparing each row with its previous (to check it's a chain) will give us a unique ID for the chain.
Then just group it up and take the minimum SemOrd
, using only the chain which contains the value we want.
WITH NextValues AS (
SELECT t.*,
LAG(t.EduId) OVER (PARTITION BY t.UserId ORDER BY t.SemOrd DESC) AS PrevEduId
FROM YourTable t
WHERE t.UserId = 1
),
Counted AS (
SELECT *,
COUNT(CASE WHEN t.EduId <> t.PrevEduId OR t.PrevEduId IS NULL THEN 1 END)
OVER (PARTITION BY t.UserId ORDER BY t.SemOrd DESC ROWS UNBOUNDED PRECEDING) AS ChainId
FROM NextValues t
)
SELECT
t.UserId,
MIN(t.SemOrd) AS SemOrd_First
FROM Counted t
GROUP BY
t.UserId,
t.ChainId
HAVING COUNT(CASE WHEN t.SubId = 706 THEN 1 END) > 0;