postgresql pagination sequelize.js jsonb postgresql-performance

Indexing strategy for filters including a JSONB property

I have a jsonb column created via sequelize migration this way:

queryInterface.addColumn('Entities', 'summary', {
  type: Sequelize.DataTypes.JSONB,
})

My Postgres version:

PostgreSQL 15.2 (Debian 15.2-1.pgdg110+1) on aarch64-unknown-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit

My rows generally have this kind of structure:

{ 
  summary: {
    foobar: 123.456
  }
}

I need a query that compares a parameter to a property of that JSONB summary:

where: {
  summary: {
    foobar: {
      [Sequelize.Op.lt]: parseFloat(maxFoobar),
    },
  },
}

Query generated by Sequelize:

SELECT "id", "name", "type", "geometry", "summary", "createdAt", "updatedAt"
FROM "Entities" AS "Entity"
WHERE CAST(("Entity"."summary"#>>'{foobar}') AS DOUBLE PRECISION) < 64
ORDER BY "Entity"."createdAt" DESC
LIMIT 100
OFFSET 0;

With 150,000 rows, this is taking 300-400ms:

                                                                         QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=29685.31..29696.98 rows=100 width=1475) (actual time=81.465..85.319 rows=100 loops=1)
   Output: id, name, type, geometry, summary, "createdAt", "updatedAt"
   Buffers: shared hit=91259 read=17301
   ->  Gather Merge  (cost=29685.31..34772.56 rows=43602 width=1475) (actual time=81.461..85.310 rows=100 loops=1)
         Output: id, name, type, geometry, summary, "createdAt", "updatedAt"
         Workers Planned: 2
         Workers Launched: 2
         Buffers: shared hit=91259 read=17301
         ->  Sort  (cost=28685.29..28739.79 rows=21801 width=1475) (actual time=76.879..76.889 rows=50 loops=3)
               Output: id, name, type, geometry, summary, "createdAt", "updatedAt"
               Sort Key: "Entity"."createdAt" DESC
               Sort Method: top-N heapsort  Memory: 309kB
               Buffers: shared hit=91259 read=17301
               Worker 0:  actual time=75.047..75.062 rows=65 loops=1
                 Sort Method: top-N heapsort  Memory: 353kB
                 Buffers: shared hit=29140 read=5589
               Worker 1:  actual time=75.044..75.055 rows=63 loops=1
                 Sort Method: top-N heapsort  Memory: 351kB
                 Buffers: shared hit=29074 read=5532
               ->  Parallel Seq Scan on public."Entities" "Entity"  (cost=0.00..27852.07 rows=21801 width=1475) (actual time=0.184..68.223 rows=20705 loops=3)
                     Output: id, name, type, geometry, summary, "createdAt", "updatedAt"
                     Filter: ((("Entity".summary #>> '{foobar}'::text[]))::double precision < '64'::double precision)
                     Rows Removed by Filter: 31618
                     Buffers: shared hit=91145 read=17301
                     Worker 0:  actual time=0.240..66.849 rows=20034 loops=1
                       Buffers: shared hit=29083 read=5589
                     Worker 1:  actual time=0.081..66.737 rows=19791 loops=1
                       Buffers: shared hit=29017 read=5532
 Planning:
   Buffers: shared hit=40
 Planning Time: 0.815 ms

How can I build an index on that JSONB key to run the query?

Solution

Given that your filter criteria is not very selective, scanning an index on just "createdAt" is certainly a good option. Postgres will find 100 rows to satisfy the query quickly. Inspecting nested values of a JSON document always adds cost (especially if the value is big).

This tailored partial index cuts down on this cost:

CREATE INDEX ON "Entities" ("createdAt" DESC)
WHERE (summary->>'foobar')::float8 < 64;

Adding a (redundant) generated column, with the index based on that would help some more:

ALTER TABLE "Entities"
ADD COLUMN foobar float8 GENERATED ALWAYS AS ((summary->>'foobar')::float8) STORED;

CREATE INDEX ON "Entities" ("createdAt" DESC)
WHERE foobar < 64;

See:

Computed / calculated / virtual / derived / generated columns in PostgreSQL

Depending on the complete situation and missing information, a more generic index may be good enough - and serve more purposes:

CREATE INDEX ON "Entities" ("createdAt");

If your pagination goes deep, consider upgrading from LIMIT / OFFSET to "ROW value comparsion", a.k.a. "keyset pagination". See:

Optimize query with OFFSET on large table