I'd like to concatenate tables with different schemas, filling unknown values with null.
Simply using UNION ALL of course does not work like this:
WITH
x AS (SELECT 1 AS a, 2 AS b ),
y AS (SELECT 3 AS b, 4 AS c )
SELECT * FROM x
UNION ALL
SELECT * FROM y
a b
1 2
3 4
(unwanted result)
In Spark, I'd use unionByName to get the following result:
a b c
1 2
3 4
(wanted result)
Of course, I can manually create the needed query (adding nullss) in BigQuery like so:
SELECT a, b, NULL c FROM x
UNION ALL
SELECT NULL a, b, c FROM y
But I'd prefer to have a generic solution, not requiring me to generate something like that.
So, is there something like unionByName in BigQuery? Or can one come up with a generic SQL function for this?
Consider below approach (I think it is as generic as one can get)
create temp function json_extract_keys(input string) returns array<string> language js as """
return Object.keys(JSON.parse(input));
""";
create temp function json_extract_values(input string) returns array<string> language js as """
return Object.values(JSON.parse(input));
""";
create temp table temp_table as (
select json, key, value
from (
select to_json_string(t) json from table_x as t
union all
select to_json_string(t) from table_y as t
) t, unnest(json_extract_keys(json)) key with offset
join unnest(json_extract_values(json)) value with offset
using(offset)
order by key
);
execute immediate(select '''
select * except(json) from temp_table
pivot (any_value(value) for key in ("''' || string_agg(distinct key, '","') || '"))'
from temp_table
)
if applied to sample data in your question - output is