sqlgroup-bygoogle-bigqueryleft-joinnested-table

How to Group By all columns of table A when joining two tables to create one big nested table on BigQuery?


I'm trying to create a big nested table that would be composed of many tables, such as my Clients table, Phone Numbers, Emails... They all have in common the client_id field. For the moment I have to following query that works well ("join" the Clients table fields and the according Phone Numbers fields):

SELECT Clients.*, ARRAY_AGG( STRUCT(Timestamp, Country_Code, Local_Number, Phone_Number, Whatsapp)) as Phones
FROM Clients LEFT JOIN Phones USING(client_id)
GROUP BY Client.client_id, Clients.Timestamp, Clients.First_Name, Clients.Last_Name, Clients.DOB

Client.client_id, Clients.Timestamp, Clients.First_Name, Clients.Last_Name, Clients.DOB are all my fields in Clients table. I would like to use this query as subquery to "join" it to the Emails table in a similar way (using with and renaming the result of the subquery). The Thing is that I would like to GROUP BY all the fields of Clients table without writing them all every time. Neither GROUP BY Clients.* nor GROUP BY ALL work...

What can I do to shorten this ?


Solution

  • If client_id is unique, then you can just aggregate by that. What you want is to get all the columns when you do that. A very BigQuery'ish approach is:

    SELECT ANY_VALUE(c).*, 
           ARRAY_AGG( STRUCT(p.Timestamp, p.Country_Code, p.Local_Number, p.Phone_Number, p.Whatsapp)) as Phones
    FROM Clients c LEFT JOIN
         Phones p
         USING (client_id)
    GROUP BY c.client_id;
    

    This works fine when I run it:

    WITH clients as (
          select 'x' as name, 1 as client_id union all 
          select 'y' as name, 2 as client_id
        ),
        phones as (
            select current_timestamp as timestamp, 1 as client_id, 'abc' as country_code, 111 as local_number, 222 as phone_number, null as whatsapp
        )
    SELECT ANY_VALUE(c).*, 
           ARRAY_AGG( STRUCT(p.Timestamp, p.Country_Code, p.Local_Number, p.Phone_Number, p.Whatsapp)) as Phones
    FROM Clients c LEFT JOIN
         Phones p
         USING (client_id)
    GROUP BY c.client_id;