performancesap-ase

Puzzling performance of index scan. Why is scanning index slow even though result set is small and indexed


I have large table ir_data (150GB) which contains data for different dates (column val_date). I need to know if a given date is available in the ir_data at various points in my application.

select distinct(val_date) from ir_data 

I the following experiment ir_data contains 29 different values for val_date.


SETUP 1

I expected an index on ir_data (val_date, key_id, other_colum) to help finding the 29 values quickly. In fact this takes more than 5 minutes:

Query 1 of 1, Rows read: 29, Elapsed time (seconds) - Total: 343.96, SQL query: 343.958, Reading results: 0.002

I always expected an index to be a tree where the nodes are stored in a tree structure, e.g. like this

val_date -> key_id   -> other_column -> data-nodes

1.1.2017 -> 0-50     -> A            -> (1.1.2017, 0, Automobile), (1.1.2017, 2, Amsterdam)
                     -> B-E          -> (1.1.2017, 12, Batman)
         -> 51-100   -> A            -> ...
                        X
         -> 666-1000 -> A
                     -> B-C
                     -> E
2.1.2017 -> ...

Based on this structure getting the 29 different val_dates should be very quick.

Question: Why does this take so long???

Sub-question: Is there a way to fix this without creating another table?


SETUP 2

I created another index which only contains val_date. It takes about the same amount of time.


Query-Plan:

    The type of query is SELECT.

2 operator(s) under root

   |ROOT:EMIT Operator (VA = 2)
   |
   |   |GROUP SORTED Operator (VA = 1)
   |   |Distinct
   |   |
   |   |   |SCAN Operator (VA = 0)
   |   |   |  FROM TABLE
   |   |   |  ir_data
   |   |   |  Index : ir_data_idx1 <-- this is the index containing only val_date.
   |   |   |  Forward Scan.
   |   |   |  Positioning at index start.
   |   |   |  Index contains all needed columns. Base table will not be read.
   |   |   |  Using I/O Size 16 Kbytes for index leaf pages.
   |   |   |  With MRU Buffer Replacement Strategy for index leaf pages.

Solution

  • Your table is very large as well as your index. As you can see in the plan, the engine performs an index scan. This operation will be lengthy because it will scan your entire index for distinct values.

    As a first step, You could try update index statistics on the index, though, i don't really think it'll help.

    If it's a one time manual operation, i guess you'll be satisfied with the 5 min operation.

    If it's a query executed by your application then you have 2 choices i can think of:

    1. As you said in the question - normalize the table by creating additional table for the dates and use FK.
    2. Create a precomputed result set. This is a materialized view - the results are stored like a regular table (as opposed to a view where only its definition is stored).
      It will automatically refresh the results in the view and values will be retrieved fast.
      Important: like indexes, it will have an impact on the performance of Insert, Update...
      It can look like:

      create precomputed result set prs_ir_data
      immediate refresh
      as
      select distinct val_date
      from ir_data
      

    You can read here and here about precomputed result set