sqldb2-400ibm-db2

Hybrid join in DB2 z/os sql


I have been rewriting decades old DB z/OS queries and there are three tables as below:

customer

+-----------+----------+---------+
|customer_id|state_code|city_code|
+-----------+----------+---------+

customer_address

+-----------+-------------+
|customer_id|facility_name|
+-----------+-------------+

loan

+-----------+----------+---------+
|loan_code  |state_code|city_code|
+-----------+----------+---------+

customer = customer_address is one-to-one and customer = loan is one-to-many.

We used to have two different queries to get list of customers by statewise and citywise, who have loans and are "active in business" (by having a record in customer_address!) as below:

SELECT CUST.STATE_CODE, CUST.CITY_CODE, CUST_ADRS.FAC_NAME 
FROM  CUSTOMER CUST, CUST_ADDRESS WHERE CUST_ADRS.ADR_ID <> 0 AND      
CUST_ADRS.CUST_ID = CUST.CUST_ID

The result of the above query is collected and each state and city is passed to below query from PreparedStatement. If there is a loan_id exists, then we collect the state, city and facility name.

SELECT CL.LOAN_ID FROM CUSTOMER_LOAN CL WHERE
CL.LOAN_CODE IN ('A1', 'BC') AND CL.STATE_CODE = ? AND CL.CITY_CODE = ?

I have rewritten these two queries into a single query. (There are indices in place for customer.cust_id, customer_loan.loan_id). I did not include loan_id in the modified query.

SELECT DISTINCT CUST.STATE_CODE, CUST.CITY_CODE, CUST_ADRS.FAC_NAME
FROM CUSTOMER CUST INNER JOIN CUST_ADDRESS CUST_ADRS ON 
CUST_ADRS.ADR_ID <> 0 AND CUST.CUST_ID = CUST_ADRS.CUST_ID
INNER JOIN CUSTOMER_LOAN CL
ON 
CL.LOAN_CODE IN ('A1', 'BC') and
CL.STATE_CODE = CUST.STATE_CODE and
CL.CITY_CODE = CUST.CITY_CODE

Now I could see performance has significantly improved in the web application and query execution time takes some 700 ms. But I wonder if there is anything I can do to improve or modify this query. Any inputs or thoughts are greatly appreciated.


Solution

  • One option which might be faster would be to use EXISTS:

    select c.state_code, c.city_code, ca.fac_name
    from customer c 
        join customer_address ca on c.customer_id = ca.customer_id 
    where exists (
        select 1
        from loan l
        where l.state_code = c.state_code and
              l.city_code = c.city_code and
              l.loan_code in ('A1','BC')
        )
    

    As always though, you need to test each query for yourself to see which performs the best.