I have a table, Customers, that has seven columns: ID, CustomerName, Address1, Address2, City, State, PostCode. I am trying to do cursor pagination via an API over the result set but the user can determine the sorting column. There is no unique constraint on any field but the ID field (which is an auto-increment). When I do the sorting on the ID field, I am able to page through the entire dataset without problem. For example, my first call to the table is
SELECT TOP(11)
c.*
FROM [Customers] AS [c]
ORDER BY
c.Id desc
I take the 10 results, set the 11th as my cursor and then my next query would be
SELECT TOP(11)
c.*
FROM [Customers] AS [c]
where ID <= 999990
ORDER BY
c.Id desc
And so on and so forth.
If I page via a column that isn't quite as straightforward, I run into a perplexing problem:
Here is a resultset where it is ordered by the CustomerName.
So in this case, I would set my cursor to 'Customer 99999' and my SQL query would be
SELECT TOP(11)
c.*
FROM [Customers] AS [c]
WHERE CustomerName <= 'Customer 99999'
ORDER BY
C.CustomerName DESC
and I'd get the following results:
That works fine when the results don't have duplicates in the columns. But when I sort via City, there are a lot of rows with the same city, so my thinking was that I should sort by two keys at all times: inputted column and then a unique key (ID). So my SQL would look like this:
SELECT TOP(11)
c.*
FROM [Customers] AS [c]
ORDER BY
C.City DESC, C.ID DESC
So, now my logic would be to keep track of the 11th row's sorted column value and the ID. In this case, I'd input City-99,998999 as my cursor. So my next query would be:
SELECT TOP(11)
c.*
FROM [Customers] AS [c]
where c.city <= 'City-99' and ID <= 998999
ORDER BY
C.City DESC, C.ID DESC
And that works. But if I apply the same logic to a dataset that doesn't have repeating values, the logic doesn't hold.
If I were to take the 11th values from this dataset which is sorted via CustomerName:
And take the 11th rows as my cursor, my query would be the following:
SELECT TOP(11)
c.*
FROM [Customers] AS [c]
where c.CustomerName <= 'Customer 99999' and ID <= 99999
ORDER BY
C.CustomerName DESC, C.ID DESC
Is there a method I am not thinking about to make a straightforward rule for the efficient moving of the cursor? It seems like sometimes doing two keys is beneficial when there are repeating values in the selected column but not beneficial when there are no repeating values.
Always sort using a "unique key" when implementing Backend Pagination.
The trivial case is when you sort by ID
, that is a key.
When sorting by another criteria that is not unique, then make it unique. For example, when sorting by City
add ID
at the end of the sorting terms, as in:
SELECT TOP(11)
c.*
FROM [Customers] AS [c]
ORDER BY
C.City DESC, C.ID DESC -- added ID here
By adding ID
at the end of the ORDER BY
clause, you make the sorting columns unique. And by doing so, the backend pagination will work as expected when retrieving the next page and so forth, as in:
SELECT TOP(11)
c.*
FROM [Customers] AS [c]
where c.city < 'City-99' or c.city = 'City-99' and ID <= 998999
ORDER BY
C.City DESC, C.ID DESC
Since SQL Server does not implement "tuple inequality" the predicate in the WHERE
clause looks clunky. However, it will work well.
Note: A tuple inequality can rephrase the search predicate as (c.city, c.id) <= ('City-99', 998999)
. They are implemented in DB2, PostgreSQL, MySQL, MariaDB. Unfortunately, they are not implemented in Oracle or SQL Server. They allow the query to be optimized in much better way, if performance is a concern.