google-bigquery

Is there something like Spark's unionByName in BigQuery?


I'd like to concatenate tables with different schemas, filling unknown values with null.

Simply using UNION ALL of course does not work like this:

WITH
  x AS (SELECT 1 AS a, 2 AS b ),
  y AS (SELECT 3 AS b, 4 AS c )
SELECT * FROM x
UNION ALL
SELECT * FROM y
a b
1 2
3 4

(unwanted result)

In Spark, I'd use unionByName to get the following result:

a b c
1 2 
  3 4

(wanted result)

Of course, I can manually create the needed query (adding nullss) in BigQuery like so:

SELECT a, b, NULL c FROM x
UNION ALL
SELECT NULL a, b, c FROM y

But I'd prefer to have a generic solution, not requiring me to generate something like that.

So, is there something like unionByName in BigQuery? Or can one come up with a generic SQL function for this?


Solution

  • Consider below approach (I think it is as generic as one can get)

    create temp function json_extract_keys(input string) returns array<string> language js as """
      return Object.keys(JSON.parse(input));
    """;
    create temp function json_extract_values(input string) returns array<string> language js as """
      return Object.values(JSON.parse(input));
    """;
    
    create temp table temp_table as (
      select json, key, value
      from (
        select to_json_string(t) json from table_x as t
        union all
        select to_json_string(t) from table_y as t
      ) t, unnest(json_extract_keys(json)) key with offset
      join unnest(json_extract_values(json)) value with offset
      using(offset)
      order by key
    );
    
    execute immediate(select '''
    select * except(json) from temp_table
    pivot (any_value(value) for key in ("''' || string_agg(distinct key, '","') || '"))'
    from temp_table
    )          
    

    if applied to sample data in your question - output is

    enter image description here