sqlpostgresqlbackend

SQL query order by and limit taking a lot of time


I am querying a table with around 4M data with two types of skuID and using order by on version as there are around avg 5k versions of data for one skuID and fetching the top most versions using limit.

Query:

 select * FROM table rb 
    WHERE rb.sku_id='' or rb.package_sku_id=''
    order by version desc
limit 1;

By using explain on query we found out that ORDER BY and LIMIT is taking most of the cost for the query:

Limit  (cost=0.43..5304.64 rows=1 width=861) (actual time=50327.036..50327.041 rows=1 loops=1)
  Buffers: shared hit=361280 read=104302 written=18
  I/O Timings: read=40363.693 write=0.215
  ->  Index Scan Backward using "IDX488yr43nr28a1yml9lb5i7jfv" on referral_benefits rb  (cost=0.43..9552890.48 rows=1801 width=861) (actual time=50327.028..50327.028 rows=1 loops=1)
        Filter: (((sku_id)::text = 'b1d5fa77-fda8-466c-b8b0-a8e60d1f78a5'::text) OR ((package_sku_id)::text = 'b1d5fa77-fda8-466c-b8b0-a8e60d1f78a5'::text))
        Rows Removed by Filter: 1361027
        Buffers: shared hit=361280 read=104302 written=18
        I/O Timings: read=40363.693 write=0.215
Planning Time: 1.121 ms
Execution Time: 50329.843 ms

We have added indexes on skuId and package_sku_id combined but that is not reducing the time.

Need guidance on this. Thanks in advance


Solution

  • If you have two indexes, one on (sku_id, version) and one on (package_sku_id, version), then you could get very fast execution by doing:

    (select * FROM rb WHERE rb.sku_id='b1d5fa77-fda8-466c-b8b0-a8e60d1f78a5' order by version desc) 
        union all
    (select * FROM rb WHERE rb.package_sku_id='b1d5fa77-fda8-466c-b8b0-a8e60d1f78a5' order by version desc)
    order by version desc limit 1;
    

    But note that if you increase the LIMIT beyond 1, then this query might return duplicate rows if both SKU conditions are satisfied for the same row.