postgresqloracle-databasetalenddatastagesap-data-services

Avoiding multiple look ups and improving the performance


I have a case, where I need to do multiple joins(lookups) like below query. Sample scenario was given.

I have around 200 CAT_CODE. I thought few solutions and I listed it down as cases. Is there is any different way to write a SQL query to have better performance? or any better approach in ETL tool?

Primary Table(PRIM):

NUM     CAT1_CODE   CAT2_CODE   CAT3_CODE
A          1           y           q     
B          2           e           a     
C          3           s           z    

Secondary Table(LOV):

CATEGORY    COLUMN_LKP        EXT_CODE
CAT1_CODE       1                AB
CAT1_CODE       2                CD
CAT1_CODE       3                HI
CAT2_CODE       y                JL
CAT2_CODE       e                QD
CAT2_CODE       s                AH
CAT3_CODE       q                CD
CAT3_CODE       a                MS
CAT3_CODE       z                EJ

CASE-1: Through SQL:

I have written a simple query to accomplish this task. Do you think, this would be right approach? Any other ways, to improve this query? Right now, I'm using both Oracle and Postgres.

SELECT 
NUM,
(SELECT EXT_CODE FROM TEST_LOV 
WHERE CATEGRY='CAT1_CODE' AND COLUMN_LKP=A.CAT1_CODE) CAT1,
(SELECT EXT_CODE FROM TEST_LOV 
WHERE CATEGRY='CAT2_CODE' AND COLUMN_LKP=A.CAT2_CODE) CAT2,
(SELECT EXT_CODE FROM TEST_LOV 
WHERE CATEGRY='CAT3_CODE' AND COLUMN_LKP=A.CAT3_CODE) CAT3 
FROM 
TEST_PRIM A

REQUIRED OUTPUT:

NUM CAT1    CAT2    CAT3
A    AB      JL      CD
B    CD      QD      MS
C    HI      AH      EJ

CASE-2: ETL:

Same case can be accomplished through ETL. We need to use lookups to get that done.

Scenario-1:

       LOV(CAT1_CODE)  LOV(CAT2_CODE)   LOV(CAT3_CODE)
           |                |                  |
           |                |                  |
PRIM---->LOOKUP---------->LOOKUP------------>LOOKUP-------->TARGET

I don't think, that would be right approach. We have 200 codes, we cannot use 200 lookup. Is there is any better approach to handle that in ETL (Datastage, Talend, BODS) with better performance?

Scenario-2:

Pivoting PRIM(converting CAT1_CODE,CAT2_CODE,CAT3_CODE columns in to rows) like below and doing one lookup. But pivoting will take much time, because we have data around 600 million and 200 columns.

NUM     CATGRY           CODE
A       CAT1_CODE          1
A       CAT1_CODE          y
A       CAT1_CODE          q
B       CAT2_CODE          2
B       CAT2_CODE          e
B       CAT2_CODE          a 
C       CAT3_CODE          3
C       CAT3_CODE          s
C       CAT3_CODE          z 

Kindly suggest me some best way to handle this approach. It can be through ETL or through SQL. Thanks in advance.


Solution

  • You can use the LATERAL keyword to do the magic that you are looking for.

    The following code could help:

    SELECT 
      NUM, 
      MAX(ext_code) FILTER (WHERE c.CATEGORY='CAT1_CODE') AS CAT1,
      MAX(ext_code) FILTER (WHERE c.CATEGORY='CAT2_CODE') AS CAT2,
      MAX(ext_code) FILTER (WHERE c.CATEGORY='CAT3_CODE') AS CAT3
    FROM TEST_PRIM a
      CROSS JOIN LATERAL (
        SELECT * 
        FROM TEST_LOV b 
        WHERE 
          (a.CAT1_CODE=b.COLUMN_LKP AND B.CATEGORY = 'CAT1_CODE')
          OR (a.CAT2_CODE=b.COLUMN_LKP AND B.CATEGORY = 'CAT2_CODE')
          OR (a.CAT3_CODE=b.COLUMN_LKP AND B.CATEGORY = 'CAT3_CODE')
        ) c
     GROUP BY NUM
     ORDER BY NUM; 
    

    Output

     num | cat1 | cat2 | cat3
    -----+------+------+------
     A   | AB   | JL   | CD
     B   | CD   | QD   | MS
     C   | HI   | AH   | EJ