openedgeprogress-db

How to SELECT a single record in table X with the largest value for X.a WHERE values for fields X.b & X.c are specified


I am using the following query to obtain the current component serial number (tr_sim_sn) installed on the host device (tr_host_sn) from the most recent record in a transaction history table (PUB.tr_hist)

SELECT tr_sim_sn FROM PUB.tr_hist   
WHERE tr_trnsactn_nbr = (SELECT max(tr_trnsactn_nbr) 
                 FROM PUB.tr_hist 
                 WHERE tr_domain = 'vattal_us' 
                 AND tr_lot = '99524136' 
                 AND tr_part = '6684112-001') 

The actual table has ~190 million records. The excerpt below contains only a few sample records, and only fields relevant to the search to illustrate the query above:

tr_sim_sn      |tr_host_sn* |tr_host_pn     |tr_domain  |tr_trnsactn_nbr |tr_qty_loc
_______________|____________|_______________|___________|________________|___________
...            |               
356136072015140|99524135    |6684112-000    |vattal_us  |178415271       |-1.0000000000
356136072015458|99524136    |6684112-001    |vattal_us  |178424418       |-1.0000000000
356136072015458|99524136    |6684112-001    |vattal_us  |178628048       |1.0000000000
356136072015050|99524136    |6684112-001    |vattal_us  |178628051       |-1.0000000000
356136072015836|99524137    |6684112-005    |vattal_us  |178645337       |-1.0000000000
...

* = key field

The excerpt illustrates multiple occurrences of tr_trnsactn_nbr for a single value of tr_host_sn. The largest value for tr_trnsactn_nbr corresponds to the current tr_sim_sn installed within tr_host_sn.

This query works, but it is very slow, ~8minutes.

I would appreciate suggestions to improve or refactor this query to improve its speed.


Solution

  • I am posting this as a response to my request for an improved query.

    As it turns out, the following syntax features two distinct features that greatly improved the speed of the query. One is to include tr_domain search criteria in both main and nested portions of the query. Second is to narrow the search by increasing the number of search criteria, which in the following are all included in the nested section of the syntax:

    SELECT tr_sim_sn, 
    FROM  PUB.tr_hist 
    WHERE tr_domain = 'vattal_us' 
      AND tr_trnsactn_nbr IN (
          SELECT MAX(tr_trnsactn_nbr) 
          FROM   PUB.tr_hist 
          WHERE tr_domain  = 'vattal_us' 
            AND tr_part    = '6684112-001' 
            AND tr_lot     = '99524136' 
            AND tr_type    = 'ISS-WO' 
            AND tr_qty_loc < 0)
    

    This syntax results in ~0.5s response time. (credit to my colleague, Daniel V.)

    To be fair, this query uses criteria outside the originally stated parameters that were included in the original post, making it difficult to impossible for others to attempt a reasonable answer. This omission was not on purpose of course, rather due to being fairly new to fundamentals of good query design. This query in part is a result of learning that when too-few or non-indexed fields are used as search criteria in a large table, it is sometimes helpful to narrow the search by increasing the number of search criteria items. The original had 3, this one has 5.