google-bigquerychrome-ux-report

BigQuery COALESCE for synonymous structs in wildcard query


In the table chrome-ux-report.all.201910 and earlier we had a field named experimental.first_input_delay. As of chrome-ux-report.all.201911 the same data has been renamed to first_input.delay.

Prior to this change, I've used wildcard queries like chrome-ux-report.all.* to aggregate all YYYYMM data, but now these queries are failing because the field name is different. I'm looking for a fix that can accommodate either old or new field names. Here's a simplified example:

SELECT
  COALESCE(first_input.delay.histogram.bin, experimental.first_input_delay.histogram.bin) AS fid
FROM
  `chrome-ux-report.all.*`

This results in an error that first_input_delay doesn't exist in the schema for the experimental struct:

Error: Field name first_input_delay does not exist in STRUCT<time_to_first_byte STRUCT<histogram STRUCT<bin ARRAY<STRUCT<start INT64, end INT64, density FLOAT64>>>>>` at [2:58]

Of course, the field exists in that struct for some of the tables covered by the wildcard but not others. It seems the validator only looks at the most recent table.

So my question is whether it's possible to use something like COALESCE to accommodate a field that gets renamed across tables? I understand that the schema makes this harder for us and a better solution is to use a single partitioned table but I'd like to hear if this is solvable given our current setup.


Solution

  • Try offering your users a view - a starting point could be:

    CREATE OR REPLACE VIEW `fh-bigquery.public_dump.chrome_ux_experimental_input_delay_view_202001`
    AS
    SELECT * EXCEPT(experimental)
      , experimental.first_input_delay.histogram.bin AS fid
      , CONCAT('2018', _table_suffix) ts
    FROM `chrome-ux-report.all.2018*` 
    UNION ALL
    SELECT * EXCEPT(largest_contentful_paint,  experimental), experimental.first_input_delay.histogram.bin
      , CONCAT('20190', _table_suffix) ts
    FROM `chrome-ux-report.all.20190*`  
    UNION ALL
    SELECT * EXCEPT(largest_contentful_paint,  experimental), experimental.first_input_delay.histogram.bin
      , '201910'
    FROM `chrome-ux-report.all.201910`   
    UNION ALL
    SELECT * EXCEPT(largest_contentful_paint,  experimental, first_input, layout_instability), first_input.delay.histogram.bin
      , '201911'
    FROM `chrome-ux-report.all.201911`   
    UNION ALL
    SELECT * EXCEPT(largest_contentful_paint,  experimental, first_input, layout_instability), first_input.delay.histogram.bin
      , '201912'
    FROM `chrome-ux-report.all.201912`   
    

    Now your users can run queries like:

    SELECT ts, origin, fid
    FROM `fh-bigquery.public_dump.chrome_ux_experimental_input_delay_view_202001` 
    LIMIT 10
    

    enter image description here

    Ps: These tables really need to be clustered - this query would process significantly less bytes if the tables were so.