sqlgoogle-bigquery

Selecting rows with matching multiple columns (order doesn't matter)


I am wanting to determine the number of distinct user_ids that have the same 3 animals as any user (where order doesn't matter).

For example below, user 11 and 13 both have dog, cat, bird so they would both be counted.

user_id Col1 Col2 Col3
11 dog cat bird
12 cow dog bird
13 cat bird dog

The desired output here would be

distinct_users distinct_users_with_a_match
3 2

Solution

  • Try below

    with temp as (
      select
        user_id, 
        (select string_agg(col order by col) from unnest([Col1, Col2, Col3]) col) h
      from your_table 
    )
    select 
      (select count(distinct user_id) from temp) as distinct_users,  
      (select count(distinct user_id) from(
        select user_id from temp qualify count(distinct user_id) over(partition by h) > 1
      )) as distinct_users_with_a_match