I've inherited a code base with an associated database schema.
The schema has numeric, auto-incrementing primary keys. It does pagination with queries like this:
WITH params AS
(
SELECT id
FROM table AS first_id
WHERE uuid = "{last_uuid}"
)
SELECT *
FROM table
WHERE id > params.first_id
ORDER BY id
LIMIT 100
or words to that general effect.
I've added some new tables that use the UUID as their primary key rather than having separate "public" UUID keys and internal integer IDs. But now I'm wondering how to paginate them.
I could do it like this:
SELECT *
FROM table
ORDER BY uuid
OFFSET n ROWS
FETCH NEXT 100 ROWS ONLY
Someone has objected that this has performance implications. My brief investigation suggests that this is true; the time for the first method is constant (about 7 ms in my example) while the time for the second method appears to be O(n)
in the number of rows offset - an offset of 1.45 million rows takes about 200 ms. I suspect that the rows are all still cached in RAM at that point. As the table gets bigger, performance starts to decline fairly rapidly; with 1.85 million rows, the query takes nearly 600 ms.
Now, at some level, I'm not sure I care about that kind of performance. The chances that I'll have to paginate 1+ million rows of data in my current application are slim.
But is there a good way to paginate data like this that doesn't have a natural ordering?
Database is MariaDB if that makes a big difference, though I'd generally be interested in answers for most major RDBMSs.
The PRIMARY KEY
is the most efficient ordering for SELECT *
in Engine=InnoDB. This statement is valid regardless of whether the PK is a UUID or auto_inc or something else.
"Remembering where you left off" is the most efficient way to get to the 'next' page. See Pagination
OFFSET
can lead to hiccups with duplicated/missing rows when moving from one page to the next.
UUID keys are inherently inefficient because they scatter the data around the table. See UUIDs
You should be using InnoDB.
Using no index will be terrible for your task.
WHERE uuid > some_uuid ORDER BY uuid
begs for INDEX(uuid)
or, even better, PRIMARY KEY(uuid)
.
A binary UUID (see link) will save 36-16 bytes per table and per index containing the UUID. (That may not matter for only a few million rows.)
Regardless, you must have an ordering to the rows.