postgresqlindexingquery-optimizationpostgresql-performance

Index is not used when querying JSON array


Having a table with a metadata column of the JSON type I am failing to utilize the full power of the index.

  1. Creating a new table
CREATE TABLE IF NOT EXISTS phrases (
   phraseId BIGINT,
   projectId BIGINT,
   metadata JSON
);
  1. Filling the table with random data
DO $$
DECLARE
   names TEXT[] := ARRAY['j.doe', 'j.roe', 'd.hill'];
   components TEXT[] := ARRAY['Configuration', 'User Profile', 'Input', 'Output', 'Layout'];
   i INT;
   name TEXT;
   component_count INT;
   selected_components TEXT[];
   metadata JSON;
BEGIN
   FOR i IN 0..200000 LOOP
      name := names[FLOOR(RANDOM() * 3) + 1];
      component_count := FLOOR(RANDOM() * 3) + 1;
      selected_components := ARRAY(SELECT DISTINCT components[FLOOR(RANDOM() * 5) + 1] FROM generate_series(1, component_count));
      metadata := jsonb_build_object('author', name, 'component_list', selected_components);
        
      INSERT INTO phrases (phraseid, projectid, metadata) VALUES (i, 10101, metadata);
   END LOOP;
END $$;

The metadata entry looks like this:

{
   "author" : "j.doe",
   "component_list" : ["Configuration", "User Profile"]
}
  1. Creating an index
CREATE INDEX
   index_project_metadata
ON phrases (
   projectId, 
   (metadata->>'author'),
   (metadata::jsonb->'component_list')
);
  1. Composing the query (searching for rows for the given project, user, and one or more components)
EXPLAIN (ANALYZE, VERBOSE)
SELECT 
   count(*)
FROM
   phrases 
WHERE
   projectId = 10101 AND
   metadata->>'author' = 'j.doe' AND
   metadata::jsonb->'component_list' ?| ARRAY['Configuration']
Aggregate  (cost=1552.51..1552.52 rows=1 width=8) (actual time=79.621..79.621 rows=1 loops=1)
  Output: count(*)
  ->  Index Scan using index_project_metadata on public.phrases  (cost=0.42..1552.48 rows=10 width=0) (actual time=0.030..78.888 rows=23370 loops=1)
        Output: phraseid, projectid, metadata
        Index Cond: ((phrases.projectid = 10101) AND ((phrases.metadata ->> 'author'::text) = 'j.doe'::text))
        Filter: (((phrases.metadata)::jsonb -> 'component_list'::text) ?| '{Configuration}'::text[])
        Rows Removed by Filter: 43390
Planning Time: 0.091 ms
Execution Time: 79.643 ms

However, only projectId and author filtering are speeded up by the index. The component is still filtered by comparing values in each row. The real table is more complex and it takes much longer.

I can't change the table layout, but I have full control on indexing and querying.

Do I need to construct the index differently?


Solution

  • When using JSON you'd better use the JSONB data type. This gives you more/better indexing options, but you have to change the SELECT statement accordingly.

    ALTER TABLE phrases ALTER COLUMN metadata SET DATA TYPE jsonb;
    CREATE INDEX idx_phrases_metadata ON phrases USING GIN (metadata jsonb_path_ops);
    
    SELECT
       count(*)
    FROM
       phrases
    WHERE
        projectId = 10101
    AND metadata @> '{"author":"j.doe"}'
    AND metadata @> '{"component_list":["Configuration"]}';