sqlsql-servercommon-table-expressiongaps-and-islands

Detect 1st row of an ordinal chain using recursive CTE or alternative


I have the following data:

UserId SubId EduId SemOrd
1 701 500 10
1 702 255 11
1 703 500 12
1 704 500 14
1 705 500 15
1 706 500 16
1 707 500 17
2 ... ... ..

Given a UserId and SubId, I want to find the smallest SemOrd in the same EduId chain.
For example: given UserId = 1 and SubId = 706, the result should be SemOrd = 12.
Since it implies a chain, I went for recursive CTE solution.

;WITH Base AS (
    SELECT
        UserId,
        SubId,
        EduId,
        SemOrd,
        ROW_NUMBER() OVER (PARTITION BY UserId ORDER BY SemOrd ASC) AS Ordinal,
        ROW_NUMBER() OVER (PARTITION BY UserId, EduId ORDER BY SemOrd ASC) AS OrdinalParted
    FROM /* ... */
), Chain AS (
    SELECT
        SemOrd AS SemOrd_First,
        UserId,
        SubId,
        EduId,
        SemOrd,
        Ordinal,
        OrdinalParted
    FROM Base
    UNION ALL
    SELECT
        Chain.SemOrd_First,
        Base.UserId,
        Base.SubId,
        Base.EduId,
        Base.SemOrd,
        Base.Ordinal,
        Base.OrdinalParted
    FROM
        Base
        INNER JOIN Chain
            ON Base.UserId = Chain.UserId
            AND Base.EduId = Chain.EduId
            AND Base.Ordinal = Chain.Ordinal + 1
            AND Base.OrdinalParted = Chain.OrdinalParted + 1
), Minimal AS (
    SELECT
        UserId,
        SubId,
        MIN(SemOrd_First) AS SemOrd_First
    FROM Chain
    GROUP BY
        UserId,
        SubId
)
SELECT SemOrd_First FROM Minimal WHERE UserId = 1 AND SubId = 706

On the paper and on small tables, it works.
But on the production tables, it is basically unusable as it needs to compute first all CTE entries before doing the WHERE filter. The query is way too slow.

Of course, one solution would be to move the WHERE clause inside the 1st CTE:

;WITH Base AS (
    SELECT
        UserId,
        SubId,
        EduId,
        SemOrd,
        ROW_NUMBER() OVER (PARTITION BY UserId ORDER BY SemOrd ASC) AS Ordinal,
        ROW_NUMBER() OVER (PARTITION BY UserId, EduId ORDER BY SemOrd ASC) AS OrdinalParted
    FROM /* ... */
    WHERE UserId = 1 AND SubId = 706
), Chain AS (
/* ... */

And then it is fast. But I would like to create a view out of this query.
Because of this, the WHERE clause cannot be inside the 1st CTE, otherwise we cannot choose the UserId and SubId when calling the view.

The problem is that SQL Server is not able to bubble up the WHERE clause inside the 1st CTE and reduce the amount of work in the execution plan.

So the questions are:

If none of the above could work, I guess my best bet would be a stored procedure instead.


Solution

  • You are overthinking this. This is a type of gaps-and-islands problem, and there are many simpler solutions.

    You can do it in a single pass of the base table, by using LAG to get the previous value (descending), and a windowed conditional COUNT comparing each row with its previous (to check it's a chain) will give us a unique ID for the chain.

    Then just group it up and take the minimum SemOrd, using only the chain which contains the value we want.

    WITH NextValues AS (
        SELECT t.*,
          LAG(t.EduId) OVER (PARTITION BY t.UserId ORDER BY t.SemOrd DESC) AS PrevEduId
        FROM YourTable t
        WHERE t.UserId = 1
    ),
    Counted AS (
        SELECT *,
          COUNT(CASE WHEN t.EduId <> t.PrevEduId OR t.PrevEduId IS NULL THEN 1 END)
              OVER (PARTITION BY t.UserId ORDER BY t.SemOrd DESC ROWS UNBOUNDED PRECEDING) AS ChainId
        FROM NextValues t
    )
    SELECT
      t.UserId,
      MIN(t.SemOrd) AS SemOrd_First
    FROM Counted t
    GROUP BY
      t.UserId,
      t.ChainId
    HAVING COUNT(CASE WHEN t.SubId = 706 THEN 1 END) > 0;
    

    db<>fiddle