I am trying to understand why the group by function is yielding different results in snowflake depending on how I reference the group-by fields. Here are two Queries that I believe should yield the same result, but do NOT:
Query using explicit field alias references:
select
hash('SHA2_256', CONCAT(field1,field2,field3,field4)) as hash
,field1
,field2
,field3
,field4
,count(*) as count
from <table>
where
<some filters>
group by hash, field1, field2, field3, field4;
Query using positional references to fields:
select
hash('SHA2_256', CONCAT(field1,field2,field3,field4)) as hash
,field1
,field2
,field3
,field4
,count(*) as count
from <table>
where
<same filters as above>
group by 1,2,3,4,5;
The first query yields significantly more records, suggesting maybe it isn't applying a grouping field that is being applied in the second query, but based on the snowflake docs I really believe they should be the same. How are these two different?
The clue is that the aliased expression hash
does not overshadow existing columns so:
select
hash('SHA2_256', CONCAT(field1,field2,field3,field4)) as hash
,field1
,field2
,field3
,field4
,count(*) as count
from <table>
where <some filters>
group by hash, field1, field2, field3, field4;
is
select
hash('SHA2_256', CONCAT(field1,field2,field3,field4)) as hash
,field1
,field2
,field3
,field4
,count(*) as count
from <table>
where <some filters>
group by <table>.hash, field1, field2, field3, field4;
which is different to:
select
hash('SHA2_256', CONCAT(field1,field2,field3,field4)) as hash
,field1
,field2
,field3
,field4
,count(*) as count
from <table>
where <same filters as above>
group by 1,2,3,4,5;