sql-servert-sqlindexingpaginationkeyset-pagination

Pagination from two tables in SQL Server


I have two tables with the following schema:

The tables can have duplicate values between them. For e.g - Table A row (<some-columnA-value>, 1, ...) and Table B row (<some-columnB-value>, 1, ...), 1 being the UserId.

Now, I have an API which is used to fetch all the UserId values from both tables. With increasing data, I want to now use pagination for this API and would like to modify the queries accordingly. There should also not be any duplicates over the pages or within a page.

How do I achieve this? Also a requirement is that I need to use keyset pagination rather than offset pagination since offset pagination gets slower as and when the offset increases.

So far, I have thought of using indexed views since there is only 1 column that I require to fetch but since the data keeps changing quite frequently and in large volumes, the overhead of maintaining the indexed view is not optimal.

Table A:

Column A UserId
x 1
y 2
z 3
w 4

Table B:

Column B UserId
a 1
b 3
c 5
d 6

Result (if no page size):

UserId
1
2
3
4
5
6

Result (if page size 3)

Page 1

UserId
1
2
3

Page 2

UserId
4
5
6

Solution

  • As I mention in my canonical post on Keyset Pagination (paging by key), the base query needs to be efficient.

    The key (sic) in your case, is to use a Merge Union, group up by ID, and then take the top 3. This means that the grouping is over an already sorted set, and can use a Stream Aggregation.

    So first UNION ALL the tables together, then GROUP BY UserId the final result and take TOP (3).

    SELECT TOP (3)
      t.UserId
    FROM (
        SELECT a.UserId
        FROM TableA a
        UNION ALL
        SELECT b.UserId
        FROM TableB b
    ) t
    GROUP BY t.UserId
    ORDER BY UserId;
    

    Final query plan is nice and neat. Note that the union is keeping the ordering correct for the later stages.

    enter image description here

    As I mention, you need to keep the previous highest ID value, and pass it in to the next query. Put it into both halves of the union.

    SELECT TOP (3)
      t.UserId
    FROM (
        SELECT a.UserId
        FROM TableA a
        WHERE a.UserId > @previousId
        UNION ALL
        SELECT b.UserId
        FROM TableB b
        WHERE b.UserId > @previousId
    ) t
    GROUP BY t.UserId
    ORDER BY UserId;
    

    db<>fiddle