I have a large query that drives a business report - returns about 150k rows but has many steps. I am trying to join to a view that returns about 400k rows.
The view has a material that equals my main query material and a country that matches my main query country - however, the material can be either 10 or 11 digits in the view and thus match one or the other in the main query.
I am trying to join the data like this:
LEFT JOIN PUBLISH_D.GSC_MTD_PUBLIC.V_PKG_LAB_REGDESK RD
ON (
LENGTH(RD.MATERIAL_CTRY) = 12 AND
SP.PHASE_IN_RESERVED_SAP_ID = RD.SKU AND
COUNTRY_ABV = RD.COUNTRY_CODE
)
OR (
LENGTH(RD.MATERIAL_CTRY) = 13 AND
SP."Phase In Reserved Legacy ID" = RD.SKU AND
COUNTRY_ABV = RD.COUNTRY_CODE
)
OR (
LENGTH(RD.MATERIAL_CTRY) = 13 AND
SP.PHASE_IN_RESERVED_UU = RD.SKU AND
COUNTRY_ABV = RD.COUNTRY_CODE
)
However, it just runs endlessly at about 51% bytes scanned.
If I change it to this:
LEFT JOIN PUBLISH_D.GSC_MTD_PUBLIC.V_PKG_LAB_REGDESK RD ON SP.PHASE_IN_RESERVED_SAP_ID = RD.SKU AND COUNTRY_ABV = RD.COUNTRY_CODE
it executes quickly. Am I missing something in the first JOIN that is causing such a delay?
As mentioned above, UNION ALL is a popular way to solve this. We have also done this in the past though:
LEFT JOIN PUBLISH_D.GSC_MTD_PUBLIC.V_PKG_LAB_REGDESK RD1
ON LENGTH(RD1.MATERIAL_CTRY) = 12 AND
SP.PHASE_IN_RESERVED_SAP_ID = RD1.SKU AND
COUNTRY_ABV = RD1.COUNTRY_CODE
LEFT JOIN PUBLISH_D.GSC_MTD_PUBLIC.V_PKG_LAB_REGDESK RD2
ON LENGTH(RD2.MATERIAL_CTRY) = 13 AND
SP."Phase In Reserved Legacy ID" = RD2.SKU AND
COUNTRY_ABV = RD2.COUNTRY_CODE
LEFT JOIN PUBLISH_D.GSC_MTD_PUBLIC.V_PKG_LAB_REGDESK RD3
ON LENGTH(RD3.MATERIAL_CTRY) = 13 AND
SP.PHASE_IN_RESERVED_UU = RD3.SKU AND
COUNTRY_ABV = RD3.COUNTRY_CODE
And then you would run this sort of thing:
COALESCE(RD1.<field_name>,RD2.<field_name>,RD3.<field_name>) AS <field_name>