Having a table with a metadata column of the JSON type I am failing to utilize the full power of the index.
CREATE TABLE IF NOT EXISTS phrases (
phraseId BIGINT,
projectId BIGINT,
metadata JSON
);
DO $$
DECLARE
names TEXT[] := ARRAY['j.doe', 'j.roe', 'd.hill'];
components TEXT[] := ARRAY['Configuration', 'User Profile', 'Input', 'Output', 'Layout'];
i INT;
name TEXT;
component_count INT;
selected_components TEXT[];
metadata JSON;
BEGIN
FOR i IN 0..200000 LOOP
name := names[FLOOR(RANDOM() * 3) + 1];
component_count := FLOOR(RANDOM() * 3) + 1;
selected_components := ARRAY(SELECT DISTINCT components[FLOOR(RANDOM() * 5) + 1] FROM generate_series(1, component_count));
metadata := jsonb_build_object('author', name, 'component_list', selected_components);
INSERT INTO phrases (phraseid, projectid, metadata) VALUES (i, 10101, metadata);
END LOOP;
END $$;
The metadata entry looks like this:
{
"author" : "j.doe",
"component_list" : ["Configuration", "User Profile"]
}
CREATE INDEX
index_project_metadata
ON phrases (
projectId,
(metadata->>'author'),
(metadata::jsonb->'component_list')
);
EXPLAIN (ANALYZE, VERBOSE)
SELECT
count(*)
FROM
phrases
WHERE
projectId = 10101 AND
metadata->>'author' = 'j.doe' AND
metadata::jsonb->'component_list' ?| ARRAY['Configuration']
Aggregate (cost=1552.51..1552.52 rows=1 width=8) (actual time=79.621..79.621 rows=1 loops=1)
Output: count(*)
-> Index Scan using index_project_metadata on public.phrases (cost=0.42..1552.48 rows=10 width=0) (actual time=0.030..78.888 rows=23370 loops=1)
Output: phraseid, projectid, metadata
Index Cond: ((phrases.projectid = 10101) AND ((phrases.metadata ->> 'author'::text) = 'j.doe'::text))
Filter: (((phrases.metadata)::jsonb -> 'component_list'::text) ?| '{Configuration}'::text[])
Rows Removed by Filter: 43390
Planning Time: 0.091 ms
Execution Time: 79.643 ms
However, only projectId
and author
filtering are speeded up by the index. The component is still filtered by comparing values in each row. The real table is more complex and it takes much longer.
I can't change the table layout, but I have full control on indexing and querying.
Do I need to construct the index differently?
When using JSON you'd better use the JSONB data type. This gives you more/better indexing options, but you have to change the SELECT statement accordingly.
ALTER TABLE phrases ALTER COLUMN metadata SET DATA TYPE jsonb;
CREATE INDEX idx_phrases_metadata ON phrases USING GIN (metadata jsonb_path_ops);
SELECT
count(*)
FROM
phrases
WHERE
projectId = 10101
AND metadata @> '{"author":"j.doe"}'
AND metadata @> '{"component_list":["Configuration"]}';