I have a jsonb
column created via sequelize
migration this way:
queryInterface.addColumn('Entities', 'summary', {
type: Sequelize.DataTypes.JSONB,
})
My Postgres version:
PostgreSQL 15.2 (Debian 15.2-1.pgdg110+1) on aarch64-unknown-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit
My rows generally have this kind of structure:
{
summary: {
foobar: 123.456
}
}
I need a query that compares a parameter to a property of that JSONB summary:
where: {
summary: {
foobar: {
[Sequelize.Op.lt]: parseFloat(maxFoobar),
},
},
}
Query generated by Sequelize:
SELECT "id", "name", "type", "geometry", "summary", "createdAt", "updatedAt"
FROM "Entities" AS "Entity"
WHERE CAST(("Entity"."summary"#>>'{foobar}') AS DOUBLE PRECISION) < 64
ORDER BY "Entity"."createdAt" DESC
LIMIT 100
OFFSET 0;
With 150,000 rows, this is taking 300-400ms:
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=29685.31..29696.98 rows=100 width=1475) (actual time=81.465..85.319 rows=100 loops=1)
Output: id, name, type, geometry, summary, "createdAt", "updatedAt"
Buffers: shared hit=91259 read=17301
-> Gather Merge (cost=29685.31..34772.56 rows=43602 width=1475) (actual time=81.461..85.310 rows=100 loops=1)
Output: id, name, type, geometry, summary, "createdAt", "updatedAt"
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=91259 read=17301
-> Sort (cost=28685.29..28739.79 rows=21801 width=1475) (actual time=76.879..76.889 rows=50 loops=3)
Output: id, name, type, geometry, summary, "createdAt", "updatedAt"
Sort Key: "Entity"."createdAt" DESC
Sort Method: top-N heapsort Memory: 309kB
Buffers: shared hit=91259 read=17301
Worker 0: actual time=75.047..75.062 rows=65 loops=1
Sort Method: top-N heapsort Memory: 353kB
Buffers: shared hit=29140 read=5589
Worker 1: actual time=75.044..75.055 rows=63 loops=1
Sort Method: top-N heapsort Memory: 351kB
Buffers: shared hit=29074 read=5532
-> Parallel Seq Scan on public."Entities" "Entity" (cost=0.00..27852.07 rows=21801 width=1475) (actual time=0.184..68.223 rows=20705 loops=3)
Output: id, name, type, geometry, summary, "createdAt", "updatedAt"
Filter: ((("Entity".summary #>> '{foobar}'::text[]))::double precision < '64'::double precision)
Rows Removed by Filter: 31618
Buffers: shared hit=91145 read=17301
Worker 0: actual time=0.240..66.849 rows=20034 loops=1
Buffers: shared hit=29083 read=5589
Worker 1: actual time=0.081..66.737 rows=19791 loops=1
Buffers: shared hit=29017 read=5532
Planning:
Buffers: shared hit=40
Planning Time: 0.815 ms
How can I build an index on that JSONB key to run the query?
Given that your filter criteria is not very selective, scanning an index on just "createdAt"
is certainly a good option. Postgres will find 100 rows to satisfy the query quickly. Inspecting nested values of a JSON document always adds cost (especially if the value is big).
This tailored partial index cuts down on this cost:
CREATE INDEX ON "Entities" ("createdAt" DESC)
WHERE (summary->>'foobar')::float8 < 64;
Adding a (redundant) generated column, with the index based on that would help some more:
ALTER TABLE "Entities"
ADD COLUMN foobar float8 GENERATED ALWAYS AS ((summary->>'foobar')::float8) STORED;
CREATE INDEX ON "Entities" ("createdAt" DESC)
WHERE foobar < 64;
See:
Depending on the complete situation and missing information, a more generic index may be good enough - and serve more purposes:
CREATE INDEX ON "Entities" ("createdAt");
If your pagination goes deep, consider upgrading from LIMIT
/ OFFSET
to "ROW value comparsion", a.k.a. "keyset pagination". See: