sqlsql-serverdatabase-cursor

How can I do cursor pagination with string based columns and integer values?


I have a table, Customers, that has seven columns: ID, CustomerName, Address1, Address2, City, State, PostCode. I am trying to do cursor pagination via an API over the result set but the user can determine the sorting column. There is no unique constraint on any field but the ID field (which is an auto-increment). When I do the sorting on the ID field, I am able to page through the entire dataset without problem. For example, my first call to the table is

SELECT TOP(11) 
c.*
FROM [Customers] AS [c]
ORDER BY 
c.Id desc

enter image description here I take the 10 results, set the 11th as my cursor and then my next query would be

SELECT TOP(11) 
    c.*
FROM [Customers] AS [c]
where ID <= 999990
ORDER BY 
c.Id desc

Second Cursor

And so on and so forth.

If I page via a column that isn't quite as straightforward, I run into a perplexing problem:

Here is a resultset where it is ordered by the CustomerName.

Third Cursor

So in this case, I would set my cursor to 'Customer 99999' and my SQL query would be

SELECT TOP(11) 
    c.*
FROM [Customers] AS [c]
WHERE CustomerName <= 'Customer 99999'
ORDER BY 
C.CustomerName DESC

and I'd get the following results: Top 11 customers by name

That works fine when the results don't have duplicates in the columns. But when I sort via City, there are a lot of rows with the same city, so my thinking was that I should sort by two keys at all times: inputted column and then a unique key (ID). So my SQL would look like this:

SELECT TOP(11) 
    c.*
FROM [Customers] AS [c]
ORDER BY 
C.City DESC, C.ID DESC

Top 11 Customers by City

So, now my logic would be to keep track of the 11th row's sorted column value and the ID. In this case, I'd input City-99,998999 as my cursor. So my next query would be:

SELECT TOP(11) 
    c.*
FROM [Customers] AS [c]
where c.city <= 'City-99' and ID <= 998999
ORDER BY 
C.City DESC, C.ID DESC

And that works. But if I apply the same logic to a dataset that doesn't have repeating values, the logic doesn't hold.

If I were to take the 11th values from this dataset which is sorted via CustomerName:

Trouble Dataset

And take the 11th rows as my cursor, my query would be the following:

SELECT TOP(11) 
    c.*
FROM [Customers] AS [c]
where c.CustomerName <= 'Customer 99999' and ID <= 99999
ORDER BY 
C.CustomerName DESC, C.ID DESC

Is there a method I am not thinking about to make a straightforward rule for the efficient moving of the cursor? It seems like sometimes doing two keys is beneficial when there are repeating values in the selected column but not beneficial when there are no repeating values.


Solution

  • Always sort using a "unique key" when implementing Backend Pagination.

    The trivial case is when you sort by ID, that is a key.

    When sorting by another criteria that is not unique, then make it unique. For example, when sorting by City add ID at the end of the sorting terms, as in:

    SELECT TOP(11) 
        c.*
    FROM [Customers] AS [c]
    ORDER BY 
    C.City DESC, C.ID DESC -- added ID here
    

    By adding ID at the end of the ORDER BY clause, you make the sorting columns unique. And by doing so, the backend pagination will work as expected when retrieving the next page and so forth, as in:

    SELECT TOP(11) 
        c.*
    FROM [Customers] AS [c]
    where c.city < 'City-99' or c.city = 'City-99' and ID <= 998999
    ORDER BY 
    C.City DESC, C.ID DESC
    

    Since SQL Server does not implement "tuple inequality" the predicate in the WHERE clause looks clunky. However, it will work well.

    Note: A tuple inequality can rephrase the search predicate as (c.city, c.id) <= ('City-99', 998999). They are implemented in DB2, PostgreSQL, MySQL, MariaDB. Unfortunately, they are not implemented in Oracle or SQL Server. They allow the query to be optimized in much better way, if performance is a concern.