I have two tables with the following schema:
Table A
: ColumnA, UserId,
... - rest of the schema omitted for brevityTable B
: ColumnB, UserId,
... - rest of the schema omitted for brevityThe tables can have duplicate values between them. For e.g - Table A row (<some-columnA-value>, 1, ...)
and Table B row (<some-columnB-value>, 1, ...)
, 1 being the UserId.
Now, I have an API which is used to fetch all the UserId
values from both tables. With increasing data, I want to now use pagination for this API and would like to modify the queries accordingly. There should also not be any duplicates over the pages or within a page.
How do I achieve this? Also a requirement is that I need to use keyset pagination rather than offset pagination since offset pagination gets slower as and when the offset increases.
So far, I have thought of using indexed views since there is only 1 column that I require to fetch but since the data keeps changing quite frequently and in large volumes, the overhead of maintaining the indexed view is not optimal.
Table A:
Column A | UserId |
---|---|
x | 1 |
y | 2 |
z | 3 |
w | 4 |
Table B:
Column B | UserId |
---|---|
a | 1 |
b | 3 |
c | 5 |
d | 6 |
Result (if no page size):
UserId |
---|
1 |
2 |
3 |
4 |
5 |
6 |
Result (if page size 3)
Page 1
UserId |
---|
1 |
2 |
3 |
Page 2
UserId |
---|
4 |
5 |
6 |
As I mention in my canonical post on Keyset Pagination (paging by key), the base query needs to be efficient.
The key (sic) in your case, is to use a Merge Union, group up by ID, and then take the top 3. This means that the grouping is over an already sorted set, and can use a Stream Aggregation.
So first UNION ALL
the tables together, then GROUP BY UserId
the final result and take TOP (3)
.
SELECT TOP (3)
t.UserId
FROM (
SELECT a.UserId
FROM TableA a
UNION ALL
SELECT b.UserId
FROM TableB b
) t
GROUP BY t.UserId
ORDER BY UserId;
Final query plan is nice and neat. Note that the union is keeping the ordering correct for the later stages.
As I mention, you need to keep the previous highest ID value, and pass it in to the next query. Put it into both halves of the union.
SELECT TOP (3)
t.UserId
FROM (
SELECT a.UserId
FROM TableA a
WHERE a.UserId > @previousId
UNION ALL
SELECT b.UserId
FROM TableB b
WHERE b.UserId > @previousId
) t
GROUP BY t.UserId
ORDER BY UserId;