[SOLVED] Is there something like Spark's unionByName in BigQuery?

Is there something like Spark's unionByName in BigQuery?

I'd like to concatenate tables with different schemas, filling unknown values with null.

Simply using UNION ALL of course does not work like this:

WITH
  x AS (SELECT 1 AS a, 2 AS b ),
  y AS (SELECT 3 AS b, 4 AS c )
SELECT * FROM x
UNION ALL
SELECT * FROM y

a b
1 2
3 4

(unwanted result)

In Spark, I'd use unionByName to get the following result:

a b c
1 2 
  3 4

(wanted result)

Of course, I can manually create the needed query (adding nullss) in BigQuery like so:

SELECT a, b, NULL c FROM x
UNION ALL
SELECT NULL a, b, c FROM y

But I'd prefer to have a generic solution, not requiring me to generate something like that.

So, is there something like unionByName in BigQuery? Or can one come up with a generic SQL function for this?

Solution

Consider below approach (I think it is as generic as one can get)

create temp function json_extract_keys(input string) returns array<string> language js as """
  return Object.keys(JSON.parse(input));
""";
create temp function json_extract_values(input string) returns array<string> language js as """
  return Object.values(JSON.parse(input));
""";

create temp table temp_table as (
  select json, key, value
  from (
    select to_json_string(t) json from table_x as t
    union all
    select to_json_string(t) from table_y as t
  ) t, unnest(json_extract_keys(json)) key with offset
  join unnest(json_extract_values(json)) value with offset
  using(offset)
  order by key
);

execute immediate(select '''
select * except(json) from temp_table
pivot (any_value(value) for key in ("''' || string_agg(distinct key, '","') || '"))'
from temp_table
)

if applied to sample data in your question - output is