A table:
CREATE TABLE events_holder(
id serial primary key,
version int not null,
data jsonb not null
);
Data field can be very very large (up to 100 Mb) and looks like this:
{
"id": 5,
"name": "name5",
"events": [
{
"id": 255,
"name": "festival",
"start_date": "2022-04-15",
"end_date": "2023-04-15",
"values": [
{
"id": 654,
"type": "text",
"name": "importance",
"value": "high"
},
{
"id": 655,
"type": "boolean",
"name": "epic",
"value": "true"
}
]
},
{
"id": 256,
"name": "discovery",
"start_date": "2022-02-20",
"end_date": "2022-02-22",
"values": [
{
"id": 711,
"type": "text",
"name": "importance",
"value": "low"
},
{
"id": 712,
"type": "boolean",
"name": "specificAttribute",
"value": "false"
}
]
}
]
}
I want to select data field by version, but filtered with extra condition: where events end_date > '2022-03-15'. And the output must look like this:
{
"id": 5,
"name": "name5",
"events": [
{
"id": 255,
"name": "festival",
"start_date": "2022-04-15",
"end_date": "2023-04-15",
"values": [
{
"id": 654,
"type": "text",
"name": "importance",
"value": "high"
},
{
"id": 655,
"type": "boolean",
"name": "epic",
"value": "true"
}
]
}
]
}
How can I do this with maximum performance? How should I index the data field?
My primary solution:
with cte as (
select eh.id, eh.version, jsonb_agg(events) as filteredEvents from events_holder eh
cross join jsonb_array_elements(eh.data #> '{events}') as events
where version = 1 and (events ->> 'end_date')::timestamp >= '2022-03-15'::timestamp
group by id, version
)
select jsonb_set(data, '{events}', cte.filteredEvents) from events_holder, cte
where events_holder.id = cte.id;
But i don't think it's a good variant.
You can do this using a JSON path expression:
select eh.id, eh.version,
jsonb_path_query_array(data,
'$.events[*] ? (@.end_date.datetime() >= "2022-03-15".datetime())')
from events_holder eh
where eh.version = 1
and eh.data @? '$.events[*] ? (@.end_date.datetime() >= "2022-03-15".datetime())'
Given your example JSON, this returns:
[
{
"id": 255,
"name": "festival",
"values": [
{
"id": 654,
"name": "importance",
"type": "text",
"value": "high"
},
{
"id": 655,
"name": "epic",
"type": "boolean",
"value": "true"
}
],
"end_date": "2023-04-15",
"start_date": "2022-04-15"
}
]
Depending on your data distribution a GIN index on data
or an index on version
could help.
If you need to re-construct the whole JSON content but with just a filtered events
array, you can do something like this:
select (data - 'events')||
jsonb_build_object('events', jsonb_path_query_array(data, '$.events[*] ? (@.end_date.datetime() >= "2022-03-15".datetime())'))
from events_holder eh
...
(data - 'events')
removes the events
key from the json. Then the the result of the JSON path query is appended back to that (partial) object.