sqljoinsnowflake-cloud-data-platform

Snowflake JOIN performance Issue


I have a large query that drives a business report - returns about 150k rows but has many steps. I am trying to join to a view that returns about 400k rows.

The view has a material that equals my main query material and a country that matches my main query country - however, the material can be either 10 or 11 digits in the view and thus match one or the other in the main query.

I am trying to join the data like this:

LEFT JOIN PUBLISH_D.GSC_MTD_PUBLIC.V_PKG_LAB_REGDESK RD 
  ON (
       LENGTH(RD.MATERIAL_CTRY) = 12 AND
       SP.PHASE_IN_RESERVED_SAP_ID = RD.SKU AND
       COUNTRY_ABV = RD.COUNTRY_CODE
     )
  OR (
       LENGTH(RD.MATERIAL_CTRY) = 13 AND
       SP."Phase In Reserved Legacy ID" = RD.SKU AND
       COUNTRY_ABV = RD.COUNTRY_CODE
     )
  OR (
       LENGTH(RD.MATERIAL_CTRY) = 13 AND
       SP.PHASE_IN_RESERVED_UU = RD.SKU AND
       COUNTRY_ABV = RD.COUNTRY_CODE
     )

However, it just runs endlessly at about 51% bytes scanned.

If I change it to this:

LEFT JOIN PUBLISH_D.GSC_MTD_PUBLIC.V_PKG_LAB_REGDESK RD ON SP.PHASE_IN_RESERVED_SAP_ID = RD.SKU AND COUNTRY_ABV = RD.COUNTRY_CODE

it executes quickly. Am I missing something in the first JOIN that is causing such a delay?


Solution

  • As mentioned above, UNION ALL is a popular way to solve this. We have also done this in the past though:

    LEFT JOIN PUBLISH_D.GSC_MTD_PUBLIC.V_PKG_LAB_REGDESK RD1
      ON LENGTH(RD1.MATERIAL_CTRY) = 12 AND
          SP.PHASE_IN_RESERVED_SAP_ID = RD1.SKU AND
          COUNTRY_ABV = RD1.COUNTRY_CODE
    LEFT JOIN  PUBLISH_D.GSC_MTD_PUBLIC.V_PKG_LAB_REGDESK RD2
      ON LENGTH(RD2.MATERIAL_CTRY) = 13 AND
          SP."Phase In Reserved Legacy ID" = RD2.SKU AND
          COUNTRY_ABV = RD2.COUNTRY_CODE
    LEFT JOIN  PUBLISH_D.GSC_MTD_PUBLIC.V_PKG_LAB_REGDESK RD3
      ON LENGTH(RD3.MATERIAL_CTRY) = 13 AND
           SP.PHASE_IN_RESERVED_UU = RD3.SKU AND
           COUNTRY_ABV = RD3.COUNTRY_CODE
    

    And then you would run this sort of thing:

    COALESCE(RD1.<field_name>,RD2.<field_name>,RD3.<field_name>) AS <field_name>