I understand number of rows can increase when UNNEST an array of structs, but why would it decrease?
Scenario: Export Google Cloud Platform Billing Data from BigQuery to Azure Synapse Analytics via the Synapse's Serverless SQL Pool using its Copy Data Activity (Synapse pipeline). The source dataset is set to BigQuery, and use query to unnest the array of struct columns.
Here is the ddl from table's schema:
billing_account_id STRING,
project STRUCT<id STRING, number STRING, name STRING, labels ARRAY<STRUCT<key STRING, value STRING>>, ancestry_numbers STRING, ancestors ARRAY<STRUCT<resource_name STRING, display_name STRING>>>,
labels ARRAY<STRUCT<key STRING, value STRING>>,
system_labels ARRAY<STRUCT<key STRING, value STRING>>,
resource STRUCT<name STRING, global_name STRING>,
usage STRUCT<amount FLOAT64, unit STRING, amount_in_pricing_units FLOAT64, pricing_unit STRING>,
credits ARRAY<STRUCT<name STRING, amount FLOAT64, full_name STRING, id STRING, type STRING>>,
invoice STRUCT<month STRING>,
cost_type STRING,
adjustment_info STRUCT<id STRING, description STRING, mode STRING, type STRING>
Here is my count row query, and result:
SELECT count(1)
FROM `export.gcp_billing_export` tbl, =>488,861 rows
UNNEST (project.labels) AS ar_proj_labels, =>236,567 rows, why a decrease from above?
UNNEST (project.ancestors) AS ar_proj_ancestors, =>1,241,985, an increase, as expected
UNNEST (labels) AS ar_labels, =>2,077,164, an increase, as expected
UNNEST (system_labels) AS ar_system_labels, =>3,639,408, an increase, as expected
UNNEST (credits) AS tbl_credits =>4,752, a big drop in number of rows, why?
Looking forward to hearing your thoughts. Thank you in advance.
Unnest will ignore the rows with null while flattening. There could be more null values in the project.labels
field. That is the reason, you get less count even when unnesting. If you want to have all rows including rows with null value, then as @jaytiger suggested replace comma ,
and use left join
.
,
and left join
for a sample table.img:1 sample table
, unnest
:WITH c AS (
SELECT 'Asia' AS continent, ['India','Singapore','Japan'] AS countries
UNION ALL SELECT 'Europe', ['Spain','England','France']
UNION ALL SELECT 'Australia', ['Australia','NZ']
UNION ALL SELECT 'Antarctica', NULL
)
SELECT continent,countries FROM c,
UNNEST(countries) AS countries;
img:2 unnested data without null row.
left join unnest
: WITH c AS (
SELECT 'Asia' AS continent, ['India','Singapore','Japan'] AS countries
UNION ALL SELECT 'Europe', ['Spain','England','France']
UNION ALL SELECT 'Australia', ['Australia','NZ']
UNION ALL SELECT 'Antarctica', NULL
)
SELECT continent,countries FROM c
left join
UNNEST(countries) AS countries;