postgresqlexistspostgresql-performancewhere-innot-exists

postgresql re-write query in another way


I would need help in re-writing query to improve performance. I think below query is slow because of OR part and two sub queries for products table (so twice scanning) for the each key in category table.

SELECT key
FROM category c
WHERE (1=1)
AND (  ((EXISTS (SELECT * from products p
                  WHERE p.attribute_key=2 
                   AND p.category_key=c.key 
                   AND ((value && CAST(ARRAY['Active', 'active'] AS text[])))
                   AND p.status='active'
        ))
     OR(NOT EXISTS(SELECT * from products p
                    WHERE p.attribute_key=2 
                     AND p.category_key=c.key 
                     AND p.status='active') 
        ))
    )
AND c.status='active'
AND c.type_key=4

expected output is (9 rows) below

key
1
13
3
6
2
7
4
10
15

Query returns keys from category table if type_key=4 and category.status='active' and

  1. products table has value='Active' or 'active' and attribute_key=2 and status='active. (this is EXISTS part in query)

  2. even if products table doesn't have any attribute_key or has attribute_key!=2 records for type_key=4. (this is OR( NOT EXISTS) part in query) Example: type_key 13 and 3

Query explain analyze plan is below. indexes are used well.

I hope query can be improved by writing another way or changing OR part in query

sample data is here dbfiddle

Aggregate  (cost=3156220.03..3156220.04 rows=1 width=8) (actual time=86100.329..86100.342 rows=1 loops=1)
  ->  Index Scan using category_type_key_status on category c  (cost=0.43..3155906.64 rows=125355 width=0) (actual time=12.618..85925.747 rows=120852 loops=1)
        Index Cond: ((type_key = 4) AND (status = 'active'))
        Filter: ((alternatives: SubPlan 1 or hashed SubPlan 2) OR (NOT (SubPlan 3)))
        Rows Removed by Filter: 86879
        SubPlan 1
          ->  Index Scan using products_category_key_attribute_key_status on products p  (cost=0.56..8.59 rows=1 width=0) (actual time=0.332..0.332 rows=1 loops=207731)
                Index Cond: ((category_key = c.key) AND (attribute_key = 2) AND (status = 'active'))
                Filter: (value && '{Active,active}'::text[])
                Rows Removed by Filter: 0
        SubPlan 2
          ->  Gather  (cost=1000.00..1155110.30 rows=8916 width=4) (never executed)
                Workers Planned: 2
                Workers Launched: 0
                ->  Parallel Seq Scan on products p_1  (cost=0.00..1153218.70 rows=3715 width=4) (never executed)
                      Filter: ((value && '{Active,active}'::text[]) AND (attribute_key = 2) AND (status = 'active'))
        SubPlan 3
          ->  Index Only Scan using  products_category_key_attribute_key_status on products p_2  (cost=0.56..8.58 rows=1 width=0) (actual time=0.008..0.008 rows=1 loops=86933)
                Index Cond: ((category_key = c.key) AND (attribute_key = 2) AND (status = 'active'))
                Heap Fetches: 11497
Planning Time: 35.808 ms
Planning Time: 35.808 ms


Solution

  • Both of the predicates on the products table are almost exactly the same. You can simply join this table once and use OR to apply either of the conditions:

    SELECT c.key
      FROM category c
      LEFT OUTER
      JOIN products p
        ON p.category_key = c.key
       AND p.status = 'active'
       AND p.attribute_key = 2
     WHERE c.status='active'
       AND c.type_key=4
       AND (   p.category_key IS NULL -- NOT EXISTS
            OR ((value && CAST(ARRAY['Active', 'active'] AS text[])))) -- or value matches
                      
    

    dbfiddle with results