sqlsnowflake-cloud-data-platform

Snowflake SQL group-by behaving differently depending whether columns are referenced by position or alias


I am trying to understand why the group by function is yielding different results in snowflake depending on how I reference the group-by fields. Here are two Queries that I believe should yield the same result, but do NOT:

Query using explicit field alias references:

select
    hash('SHA2_256', CONCAT(field1,field2,field3,field4)) as hash
    ,field1
    ,field2
    ,field3
    ,field4
    ,count(*) as count
from <table>
    where 
        <some filters>
    group by hash, field1, field2, field3, field4;

Query using positional references to fields:

select
    hash('SHA2_256', CONCAT(field1,field2,field3,field4)) as hash
    ,field1
    ,field2
    ,field3
    ,field4
    ,count(*) as count
from <table>
    where 
        <same filters as above>
    group by 1,2,3,4,5;

The first query yields significantly more records, suggesting maybe it isn't applying a grouping field that is being applied in the second query, but based on the snowflake docs I really believe they should be the same. How are these two different?


Solution

  • The clue is that the aliased expression hash does not overshadow existing columns so:

    select
         hash('SHA2_256', CONCAT(field1,field2,field3,field4)) as hash
        ,field1
        ,field2
        ,field3
        ,field4
        ,count(*) as count
    from <table>
    where <some filters>
    group by hash, field1, field2, field3, field4;
    

    is

    select
         hash('SHA2_256', CONCAT(field1,field2,field3,field4)) as hash
        ,field1
        ,field2
        ,field3
        ,field4
        ,count(*) as count
    from <table>
    where <some filters>
    group by <table>.hash, field1, field2, field3, field4;
    

    which is different to:

    select
         hash('SHA2_256', CONCAT(field1,field2,field3,field4)) as hash
        ,field1
        ,field2
        ,field3
        ,field4
        ,count(*) as count
    from <table>
    where <same filters as above>
    group by 1,2,3,4,5;