A common solution to high performance pagination is to use an indexed field, starting each new "page" from the last value of the prior page. For example, with a dataset like this (assuming Category and ID are the primary key):
Category | ID | Name
Red | 10 | Bob Jones
Red | 14 | Sam Smith
Red | 16 | Jill White
Blue | 10 | Mike Green
Blue | 16 | Mary Brown
Assuming a (rather small) page size of 1, if we want to return all of the Red
category records (assume ORDER BY Category, ID):
SELECT * FROM table WHERE Category='Red' AND ID>'00' (1st page, returns Bob Jones)
SELECT * FROM table WHERE Category='Red' AND ID>'10' (2nd page, returns Sam Smith)
SELECT * FROM table WHERE Category='Red' AND ID>'14' (3rd page, returns Jill White)
This works because by pagination "keyset" is using only the ID field (and it would also work on multiple fields if ID were globally unique, which it is not).
But if I want to return all the Red and Blue records (assuming that the table also contains other Categories), still one page at a time (assume ORDER BY Category, ID):
SELECT * FROM table WHERE Category IN ['Red', 'Blue'] AND Category>'' AND ID>'00' (1st page, returns Bob Jones)
SELECT * FROM table WHERE Category IN ['Red', 'Blue'] AND Category>'Red' AND ID>'10' (2nd page, returns Sam Smith, but skips Mike Green)
In PostgreSQL and some others, there is a "row values" predicate syntax that supports this (assume ORDER BY Category, ID):
SELECT * FROM table WHERE (Category, ID) > ('', '00') (1st page, returns Bob Jones)
SELECT * FROM table WHERE (Category, ID) > ('Red', '10') (2nd page, returns Sam Smith)
It works because both Category and ID are treated as a single compound value for the purpose of the test. But I'm not using PostgreSQL or a database that supports "row values". So the question is if there is an alternative solution that would work for this (whether there are 2 or n fields)? For it to work for pagination on multiple variable fields, I need to device a predicate that will always find the "next record" in the multi-field sort order.
PS: OFFSET/LIMIT or SKIP/LIMIT pagination works of course, but neither is efficient on large data sets, which is why I'm trying to use "keyset" pagination.
You can always phrase the predicate:
(x, y) > (a, b)
as:
x >= a and (x = a and y > b or x > a)
Note the first preficate x >= a
promotes (it doesn't ensure) the usage of an index on that column. That is, it becomes an "access predicate". The second one x = a and y > b or x > a
filters out the rows in excess, effectively becoming a "filtering predicate".
This way of phrasing "tuple inequality" predicates promotes the usage of indexes. However, they become increasingly complex if you are comparing 3, 4, or more columns.