I would need help in re-writing query to improve performance. I think below query is slow because of OR part and two sub queries for products
table (so twice scanning) for the each key in category
table.
SELECT key
FROM category c
WHERE (1=1)
AND ( ((EXISTS (SELECT * from products p
WHERE p.attribute_key=2
AND p.category_key=c.key
AND ((value && CAST(ARRAY['Active', 'active'] AS text[])))
AND p.status='active'
))
OR(NOT EXISTS(SELECT * from products p
WHERE p.attribute_key=2
AND p.category_key=c.key
AND p.status='active')
))
)
AND c.status='active'
AND c.type_key=4
expected output is (9 rows) below
key
1
13
3
6
2
7
4
10
15
Query returns keys from category table if type_key=4 and category.status='active' and
products table has value='Active' or 'active' and attribute_key=2 and status='active. (this is EXISTS
part in query)
even if products table doesn't have any attribute_key or has attribute_key!=2 records for
type_key=4.
(this is OR( NOT EXISTS)
part in query)
Example: type_key 13 and 3
Query explain analyze plan is below. indexes are used well.
I hope query can be improved by writing another way or changing OR part in query
sample data is here dbfiddle
Aggregate (cost=3156220.03..3156220.04 rows=1 width=8) (actual time=86100.329..86100.342 rows=1 loops=1)
-> Index Scan using category_type_key_status on category c (cost=0.43..3155906.64 rows=125355 width=0) (actual time=12.618..85925.747 rows=120852 loops=1)
Index Cond: ((type_key = 4) AND (status = 'active'))
Filter: ((alternatives: SubPlan 1 or hashed SubPlan 2) OR (NOT (SubPlan 3)))
Rows Removed by Filter: 86879
SubPlan 1
-> Index Scan using products_category_key_attribute_key_status on products p (cost=0.56..8.59 rows=1 width=0) (actual time=0.332..0.332 rows=1 loops=207731)
Index Cond: ((category_key = c.key) AND (attribute_key = 2) AND (status = 'active'))
Filter: (value && '{Active,active}'::text[])
Rows Removed by Filter: 0
SubPlan 2
-> Gather (cost=1000.00..1155110.30 rows=8916 width=4) (never executed)
Workers Planned: 2
Workers Launched: 0
-> Parallel Seq Scan on products p_1 (cost=0.00..1153218.70 rows=3715 width=4) (never executed)
Filter: ((value && '{Active,active}'::text[]) AND (attribute_key = 2) AND (status = 'active'))
SubPlan 3
-> Index Only Scan using products_category_key_attribute_key_status on products p_2 (cost=0.56..8.58 rows=1 width=0) (actual time=0.008..0.008 rows=1 loops=86933)
Index Cond: ((category_key = c.key) AND (attribute_key = 2) AND (status = 'active'))
Heap Fetches: 11497
Planning Time: 35.808 ms
Planning Time: 35.808 ms
Both of the predicates on the products table are almost exactly the same. You can simply join this table once and use OR to apply either of the conditions:
SELECT c.key
FROM category c
LEFT OUTER
JOIN products p
ON p.category_key = c.key
AND p.status = 'active'
AND p.attribute_key = 2
WHERE c.status='active'
AND c.type_key=4
AND ( p.category_key IS NULL -- NOT EXISTS
OR ((value && CAST(ARRAY['Active', 'active'] AS text[])))) -- or value matches