I have a case, where I need to do multiple joins(lookups) like below query. Sample scenario was given.
I have around 200 CAT_CODE. I thought few solutions and I listed it down as cases. Is there is any different way to write a SQL query to have better performance? or any better approach in ETL tool?
Primary Table(PRIM):
NUM CAT1_CODE CAT2_CODE CAT3_CODE
A 1 y q
B 2 e a
C 3 s z
Secondary Table(LOV):
CATEGORY COLUMN_LKP EXT_CODE
CAT1_CODE 1 AB
CAT1_CODE 2 CD
CAT1_CODE 3 HI
CAT2_CODE y JL
CAT2_CODE e QD
CAT2_CODE s AH
CAT3_CODE q CD
CAT3_CODE a MS
CAT3_CODE z EJ
CASE-1: Through SQL:
I have written a simple query to accomplish this task. Do you think, this would be right approach? Any other ways, to improve this query? Right now, I'm using both Oracle and Postgres.
SELECT
NUM,
(SELECT EXT_CODE FROM TEST_LOV
WHERE CATEGRY='CAT1_CODE' AND COLUMN_LKP=A.CAT1_CODE) CAT1,
(SELECT EXT_CODE FROM TEST_LOV
WHERE CATEGRY='CAT2_CODE' AND COLUMN_LKP=A.CAT2_CODE) CAT2,
(SELECT EXT_CODE FROM TEST_LOV
WHERE CATEGRY='CAT3_CODE' AND COLUMN_LKP=A.CAT3_CODE) CAT3
FROM
TEST_PRIM A
REQUIRED OUTPUT:
NUM CAT1 CAT2 CAT3
A AB JL CD
B CD QD MS
C HI AH EJ
CASE-2: ETL:
Same case can be accomplished through ETL. We need to use lookups to get that done.
Scenario-1:
LOV(CAT1_CODE) LOV(CAT2_CODE) LOV(CAT3_CODE)
| | |
| | |
PRIM---->LOOKUP---------->LOOKUP------------>LOOKUP-------->TARGET
I don't think, that would be right approach. We have 200 codes, we cannot use 200 lookup. Is there is any better approach to handle that in ETL (Datastage, Talend, BODS) with better performance?
Scenario-2:
Pivoting PRIM(converting CAT1_CODE,CAT2_CODE,CAT3_CODE columns in to rows) like below and doing one lookup. But pivoting will take much time, because we have data around 600 million and 200 columns.
NUM CATGRY CODE
A CAT1_CODE 1
A CAT1_CODE y
A CAT1_CODE q
B CAT2_CODE 2
B CAT2_CODE e
B CAT2_CODE a
C CAT3_CODE 3
C CAT3_CODE s
C CAT3_CODE z
Kindly suggest me some best way to handle this approach. It can be through ETL or through SQL. Thanks in advance.
You can use the LATERAL
keyword to do the magic that you are looking for.
The following code could help:
SELECT
NUM,
MAX(ext_code) FILTER (WHERE c.CATEGORY='CAT1_CODE') AS CAT1,
MAX(ext_code) FILTER (WHERE c.CATEGORY='CAT2_CODE') AS CAT2,
MAX(ext_code) FILTER (WHERE c.CATEGORY='CAT3_CODE') AS CAT3
FROM TEST_PRIM a
CROSS JOIN LATERAL (
SELECT *
FROM TEST_LOV b
WHERE
(a.CAT1_CODE=b.COLUMN_LKP AND B.CATEGORY = 'CAT1_CODE')
OR (a.CAT2_CODE=b.COLUMN_LKP AND B.CATEGORY = 'CAT2_CODE')
OR (a.CAT3_CODE=b.COLUMN_LKP AND B.CATEGORY = 'CAT3_CODE')
) c
GROUP BY NUM
ORDER BY NUM;
Output
num | cat1 | cat2 | cat3
-----+------+------+------
A | AB | JL | CD
B | CD | QD | MS
C | HI | AH | EJ