postgresqlquery-performance

Order of columns in compound indexes


I am using a compound index on a table with more than 13 million records. The index order is (center_code, created_on, status). The center_code and status both are varchar(100) not NULL and created_on is timestamp without time zone.

I read somewhere that order of indexes matter in a compound index. We have to check for number of unique values and put the one with the highest number of unique values at the first place in compound index.

The question is what can be the number of unique values for created_on? Should I put it first in the compound index?

Indexing on date column works on date basis, hour basis or second basis.

The problem is:

A simple SELECT query is taking more than 500 ms which is using just this compound index and nothing else.

Indexes on table:

Indexes:
    "pa_key" PRIMARY KEY, btree (id)
    "pa_uniq" UNIQUE CONSTRAINT, btree (wbill)
    "pa_center_code_created_on_status_idx_new" btree (center_code, created_on, status)

The query is:

EXPLAIN ANALYSE 
SELECT "pa"."wbill" 
FROM "pa" 
WHERE ("pa"."center_code" = 'IND110030AAC' 
AND "pa"."status" IN ('Scheduled') 
AND "pa"."created_on" >= '2018-10-10T00:00:00+05:30'::timestamptz);

Query Plan:

   Index Scan using pa_center_code_created_on_status_idx_new on pa  (cost=0.69..3769.18 rows=38 width=13) (actual time=5.592..15.526 rows=78 loops=1)
   Index Cond: (((center_code)::text = 'IND110030AAC'::text) AND (created_on >= '2018-10-09 18:30:00+00'::timestamp with time zone) AND ((status)::text = 'Scheduled'::text))
     Planning time: 1.156 ms
     Execution time: 519.367 ms

Any help would be highly appreciated.


Solution

  • The index scan condition reads

    (((center_code)::text = 'IND110030AAC'::text) AND
      (created_on >= '2018-10-09 18:30:00+00'::timestamp with time zone) AND
     ((status)::text = 'Scheduled'::text))
    

    but the index scan itself is only over (center_code, created_on), while the condition on status is applied as a filter.

    Unfortunately this is not visible from the execution plan, but it follows from the following rule:

    An index scan will only use conditions if the rows satisfying the conditions are next to each other in the index.

    Let's consider this example (in index order):

     center_code  | created_on          | status
    --------------+---------------------+-----------
     IND110030AAC | 2018-10-09 00:00:00 | Scheduled
     IND110030AAC | 2018-10-09 00:00:00 | Xtra
     IND110030AAC | 2018-10-10 00:00:00 | New
     IND110030AAC | 2018-10-10 00:00:00 | Scheduled
     IND110030AAC | 2018-10-11 00:00:00 | New
     IND110030AAC | 2018-10-11 00:00:00 | Scheduled
    

    You will see that the query needs the 4th and 6th row.

    PostgreSQL cannot scan the index with all three conditions, because the required rows are not next to each other. It will have to scan only with the first two conditions, because all rows satisfying those are right next to each other.

    Your rule for multi-column indexes is wrong. The columns at the left of the index have to be the ones where = is used as comparison operator in the conditions.

    The perfect index would be one on (center_code, status, created_on).