I'm experimenting with keeping values like the following in a Postgres jsonb
field in Postgres 9.4:
[{"event_slug":"test_1","start_time":"2014-10-08","end_time":"2014-10-12"},
{"event_slug":"test_2","start_time":"2013-06-24","end_time":"2013-07-02"},
{"event_slug":"test_3","start_time":"2014-03-26","end_time":"2014-03-30"}]
I'm executing queries like:
SELECT * FROM locations
WHERE EXISTS (
SELECT 1 FROM jsonb_array_elements(events) AS e
WHERE (
e->>'event_slug' = 'test_1' AND
(
e->>'start_time' >= '2014-10-30 14:04:06 -0400' OR
e->>'end_time' >= '2014-10-30 14:04:06 -0400'
)
)
)
How would I create an index on that data for queries like the above to utilize? Does this sound reasonable design for a few million rows that each contain ~10 events in that column?
Worth noting that it seems I'm still getting sequential scans with:
CREATE INDEX events_gin_idx ON some_table USING GIN (events);
which I'm guessing is because the first thing I'm doing in the query is converting data to json array elements.
First of all, you cannot access JSON array values like that. For a given json value:
[{"event_slug":"test_1","start_time":"2014-10-08","end_time":"2014-10-12"},
{"event_slug":"test_2","start_time":"2013-06-24","end_time":"2013-07-02"},
{"event_slug":"test_3","start_time":"2014-03-26","end_time":"2014-03-30"}]
A valid test against the first array element would be:
WHERE e->0->>'event_slug' = 'test_1'
But you probably don't want to limit your search to the first element of the array. With the jsonb
data type you have additional operators and index support.
At the time of asking, there was no built-in "greater than" or "less than" operator for jsonb
columns. This changed with added SQL/JSON path functionality in Postgres 12.
You can choose between two operator classes for your GIN index. The manual:
jsonb_ops
@> (jsonb,jsonb)
@? (jsonb,jsonpath)
@@ (jsonb,jsonpath)
? (jsonb,text)
?| (jsonb,text[])
?& (jsonb,text[])
jsonb_path_ops
@> (jsonb,jsonb)
@? (jsonb,jsonpath)
@@ (jsonb,jsonpath)
(jsonb_ops
being the default.) You can cover the equality test, but your requirement for >=
comparison is only met with a jsonpath
operator. (You need a btree index in older versions.)
CREATE INDEX locations_events_gin_idx ON locations
USING gin (events jsonb_path_ops);
SELECT l.*
FROM locations l
WHERE l.events @? '$[*] ? (@.event_slug == "test_1")
? (@.end_time.datetime() < "2014-10-13".datetime()'
Or, if you really need to "OR" two filters (see below):
SELECT l.*
FROM locations l
WHERE l.events @? '$[*] ? (@.event_slug == "test_1")
? (@.start_time.datetime() < "2014-10-13".datetime() || @.end_time.datetime() < "2014-10-13".datetime())'
This is much simpler now than my original answer for older versions.
SELECT * FROM locations WHERE events @> '[{"event_slug":"test_1"}]';
This might be good enough if the filter is selective enough.
Assuming end_time >= start_time
, so we don't need two checks. Checking only end_time
is cheaper and equivalent:
SELECT l.*
FROM locations l
, jsonb_array_elements(l.events) e
WHERE l.events @> '[{"event_slug":"test_1"}]'
AND (e->>'end_time')::timestamp >= '2014-10-30 14:04:06'::timestamptz;
Related:
Utilizing an implicit JOIN LATERAL
. Details (last chapter):
Careful with the different data types! What you have in the JSON value looks like timestamp [without time zone]
, while your predicates use timestamp with time zone
literals. The timestamp
value is interpreted according to the current time zone setting, while the given timestamptz
literals must be cast to timestamptz
explicitly or the time zone would be ignored! Above query should work as desired. Detailed explanation:
More explanation for jsonb_array_elements()
:
If the above is not good enough, I would consider a MATERIALIZED VIEW
that stores relevant attributes in normalized form. This allows plain btree indexes.
The code assumes that your JSON values have a consistent format as displayed in the question.
Setup:
CREATE TYPE event_type AS (
, event_slug text
, start_time timestamp
, end_time timestamp
);
CREATE MATERIALIZED VIEW loc_event AS
SELECT l.location_id, e.event_slug, e.end_time -- start_time not needed
FROM locations l, jsonb_populate_recordset(null::event_type, l.events) e;
Related answer for jsonb_populate_recordset()
:
CREATE INDEX loc_event_idx ON loc_event (event_slug, end_time, location_id);
Also including location_id
to allow index-only scans. (See manual page and Postgres Wiki.)
Query:
SELECT *
FROM loc_event
WHERE event_slug = 'test_1'
AND end_time >= '2014-10-30 14:04:06 -0400'::timestamptz;
Or, if you need full rows from the underlying locations
table:
SELECT l.*
FROM (
SELECT DISTINCT location_id
FROM loc_event
WHERE event_slug = 'test_1'
AND end_time >= '2014-10-30 14:04:06 -0400'::timestamptz
) le
JOIN locations l USING (location_id);