sqlsnowflake-cloud-data-platform

Snowflake Column Name Resolution in SELECT Vs FROM Clause


The following Toy Example doesn't behave as intended:

WITH SAMPLE_DATA AS (
    SELECT * FROM VALUES 
        ('Voldemort', NULL),
        ('Harry','Potter'),
        ('Hagrid','') 
    AS PEOPLE(FIRST_NAME,SURNAME)
) SELECT 
    FIRST_NAME,
    COALESCE(SURNAME,'') SURNAME,
    IFF(SURNAME='',FIRST_NAME,CONCAT(FIRST_NAME,' ',SURNAME)) FULL_NAME, 
    CONCAT('Dear ',FULL_NAME,',') SALUTATION
FROM SAMPLE_DATA SD
ORDER BY SURNAME, FIRST_NAME;

References to SURNAME after COALESCE(SURNAME,'') SURNAME, in the SELECT-Column-List Clause resolve to the Column in the CTE:

Resultset showing NULL columns

This is a partial fix:

WITH SAMPLE_DATA AS (
    SELECT * FROM VALUES 
        ('Voldemort', NULL),
        ('Harry','Potter'),
        ('Hagrid','') 
    AS PEOPLE(FIRST_NAME,SURNAME)
) SELECT 
    FIRST_NAME,
    COALESCE(SURNAME,'') SURNAME_CLEANED,
    IFF(SURNAME_CLEANED='',FIRST_NAME,CONCAT(FIRST_NAME,' ',SURNAME_CLEANED)) FULL_NAME, 
    CONCAT('Dear ',FULL_NAME,',') SALUTATION
FROM SAMPLE_DATA SD
ORDER BY SURNAME_CLEANED, FIRST_NAME;

Giving the Cleaned Column a different Alias means Snowflake resolves to the right version. However the Resultset Column is now called SURNAME_CLEANED and the intention is to Hide the Data Clean-up. It should be Named SURNAME.

That can be easily fixed with a second CTE but is there a way of disambiguating to the Column Reference to the Columns in the SELECT rather than the FROM Clause?

Ambiguous Column Names can usually be resolved by Qualifying with a Name or Alias but there doesn't appear to be a way to do that here.


Solution

  • So you have correct identified there is ambiguity, and it has to be handled to get the expect results.

    And even if the results the default order of precedence (tables before select) was wanted the ambiguity should be explicitly handled to help the next person.

    So you have two options before or after.

    One before option I assume is the toy example of renamed the CTE results:

    WITH SAMPLE_DATA AS (
        SELECT * FROM VALUES 
            ('Voldemort', NULL),
            ('Harry','Potter'),
            ('Hagrid','') 
        AS PEOPLE(FIRST_NAME,__SURNAME)
    ) SELECT 
        FIRST_NAME,
        COALESCE(sd.__SURNAME,'') as SURNAME,
        IFF(SURNAME='', FIRST_NAME, CONCAT(FIRST_NAME,' ',SURNAME)) FULL_NAME, 
        CONCAT('Dear ', FULL_NAME, ',') SALUTATION
    FROM SAMPLE_DATA as SD
    ORDER BY SURNAME, FIRST_NAME;
    

    the real before is to sub-select, before:

    WITH SAMPLE_DATA AS (
        SELECT * FROM VALUES 
            ('Voldemort', NULL),
            ('Harry','Potter'),
            ('Hagrid','') 
        AS PEOPLE(FIRST_NAME,SURNAME)
    ) 
    SELECT 
        FIRST_NAME,
        COALESCE(sd.__SURNAME,'') as SURNAME,
        IFF(SURNAME='', FIRST_NAME, CONCAT(FIRST_NAME,' ',SURNAME)) FULL_NAME, 
        CONCAT('Dear ', FULL_NAME, ',') SALUTATION
    FROM (select * rename surname as __surname from SAMPLE_DATA) as SD
    ORDER BY SURNAME, FIRST_NAME;
    

    OR after, like you note:

    WITH SAMPLE_DATA AS (
        SELECT * FROM VALUES 
            ('Voldemort', NULL),
            ('Harry','Potter'),
            ('Hagrid','') 
        AS PEOPLE(FIRST_NAME,SURNAME)
    )
    select * rename x as surname from (
        SELECT 
            FIRST_NAME,
            COALESCE(sd.SURNAME,'') as x,
            IFF(x='', FIRST_NAME, CONCAT(FIRST_NAME,' ',x)) FULL_NAME, 
            CONCAT('Dear ', FULL_NAME, ',') SALUTATION
        FROM SAMPLE_DATA as SD
    )
    ORDER BY SURNAME, FIRST_NAME; 
    

    In ANSI SQL you have to name all columns explicitly, but the tool that most likely will help here is RENAME