I'm using snowflake
to attempt to bucket rows based on two columns and if that bucket size is greater then 1 then I need the value of of some new column to be 1, otherwise I would set it to 0. It's similar to NTILE()
except that I would need this to be dynamic.
The table that I have looks like the following:
ARTICLE_ID USER_ID COUNTRY_NAME NUM_COUNTRIES NUM_AUTHORS
------------------------------------------------------------------------
30522143 ecf330d Singapore 1 4
30522143 86b0faa Singapore 1 4
30522143 d331b33 Singapore 1 4
30522143 d331b33 Singapore 1 4
------------------------------------------------------------------------
30522139 1d08cfd United States 2 3
30522139 ec9c065 Canada 2 3
30522139 a750bff United States 2 3
------------------------------------------------------------------------
23410385 1d08cfd United States 3 3
23410385 ec9c065 Mexico 3 3
23410385 a41b19c France 3 3
------------------------------------------------------------------------
30285132 d331b33 Chile 2 4
30285132 1d08cf2 Peru 2 4
30285132 a750bff Chile 2 4
30285132 d742bb5 Peru 2 4
------------------------------------------------------------------------
...
The two columns that I would need to bucket are by ARTICLE_ID
then COUNTRY_NAME
. Looking at the above table if we take ARTICLE_ID
30522143
we would have 1
bucket of 4
USER_ID
since they are all Singapore
then the value of HAS_REPEATED_COUNTRY
would be 1.
In the case of ARTICLE_ID
30522139
we would have two sub-buckets one for United States
containing 2
USER_ID
s and another sub-bucket (Canada)
containing 1
USER_ID
which would result in all those users associated with United States
to have a value of 1
and the sole member associated with Canada
to have a value of 0
in HAS_REPEATED_COUNTRY
.
Hence the table would resemble:
ARTICLE_ID USER_ID COUNTRY_NAME NUM_COUNTRIES NUM_AUTHORS HAS_REPEATED_COUNTRY
------------------------------------------------------------------------------------------------
30522143 ecf330d Singapore 1 4 1
30522143 86b0faa Singapore 1 4 1
30522143 d331b33 Singapore 1 4 1
30522143 d331b33 Singapore 1 4 1
------------------------------------------------------------------------------------------------
30522139 1d08cfd United States 2 3 1
30522139 ec9c065 Canada 2 3 0
30522139 a750bff United States 2 3 1
------------------------------------------------------------------------------------------------
23410385 1d08cfd United States 3 3 0
23410385 ec9c065 Mexico 3 3 0
23410385 a41b19c France 3 3 0
------------------------------------------------------------------------------------------------
30285132 d331b33 Chile 2 4 1
30285132 1d08cf2 Peru 2 4 1
30285132 a750bff Chile 2 4 1
30285132 d742bb5 Peru 2 4 1
------------------------------------------------------------------------------------------------
...
Is there any way of producing the latter table from the former?
It seems like you want to flag countries that appear more than once for a given article. If so, you can just use a window count:
select
t.*,
case when count(*) over(partition by article_id, country_name) > 1
then 1 else 0
end as has_repeated_country
from mytable t