google-bigquerymaterialized-views

Cannot create Google BigQuery Materialized View when aggregating a value coming from a JSON type value


I have Firestore database with an installed Google BigQuery extension, so all our data ends up in Google's BigQuery.

the structure of our 'raw_changelog' table looks like this.

enter image description here

my queation is related to the data column (which is of type JSON) and the fact I cannot seem to build a Materialized View (MV) on top it.

I'd like to create the following MV;

CREATE OR REPLACE MATERIALIZED VIEW `XXX.XXX.v_data`
  PARTITION BY DATE(`published_at`)
  CLUSTER BY `event_id`
  OPTIONS (
    enable_refresh=true,
    refresh_interval_minutes=1
  )
  AS
  SELECT  
    `published_at`,
    `event_id`,
    ARRAY_AGG(`tags_unnest`.`name` IGNORE NULLS) AS `tags`
  FROM `XXX.XXX.XXX_raw_latest` AS `o`
  LEFT JOIN UNNEST(JSON_QUERY_ARRAY(`data`.`tags`)) AS `tags_unnest`
  WHERE 
    DATE(`o`.`published_at`) > '2021-01-01'
  GROUP BY
  `published_at`,
  `event_id`

However I'm getting the following error;

enter image description here

I found at that the error is related to line;

ARRAY_AGG(`tags_unnest`.`name` IGNORE NULLS) AS `tags`

I verified that tags_unnest is of type object (coming from the JSON data field) and it seems the issue comess from the fact tags_unnest is an object I guess.

if I replace the above line with;

ARRAY_AGG('aa' IGNORE NULLS) AS `tags`

OR

ARRAY_AGG(`published_at` IGNORE NULLS) AS `tags`

then it works fine and I can go ahead create the MV.

Perhaps also good to note; in the source table, if I change the type of the data field to 'STRING' then my original create-MV-query also runs without any issues..

Can anyone tell me why I cannot create the MV and possibly also a way around my problem?


Solution

  • There is a workaround for the Materialized View: Please replace the LEFT JOIN by a ,

    For a Materialized View (MV) there is a limit amount of calculations allowed. The reference says that Left/right/full outer joins are not allowed in MV and e.g. only one GROUP BY clause operation is allowed.

    Your query contains the following 4 task operations:

    If you use one less operation, it should work. Replace the left join by a ,

    Lets build a dummy table:

    CREATE OR REPLACE TABLE
      Test.ajson AS
    SELECT
      1 published_at, 1 event_id, json ' {"a":[1],"tags":[{"x":1}]}  ' AS DATA, [1,2] arr
    UNION ALL SELECT 2, 2,  json ' {"a":[2],"tags":[{"x":2},{"x":20}]}  ',  []
    UNION ALL SELECT 1, 1,  json ' {"tags":[{"x":10}]}  ',   []
    UNION ALL SELECT 5,5,json "{}", []
    

    Now build the MV:

    CREATE OR REPLACE MATERIALIZED VIEW `Test.ajsonview` as
     SELECT 
        published_at, 
        event_id,
     ARRAY_AGG(`tags_unnest`.`x` IGNORE NULLS) AS `tags`,
    # ARRAY_AGG(`tags_unnest` IGNORE NULLS) AS testing
    FROM Test.ajson 
    , # works
    #left join # throws error: Incremental materialized view query contains unsupported feature.
    UNNEST(JSON_QUERY_ARRAY(`data`.`tags`)) tags_unnest
    WHERE published_at>0
    group by 1,2 
    

    Note that the ARRAY_AGG without the .x:

     ARRAY_AGG(`tags_unnest` IGNORE NULLS) AS testing
    

    is working in combination with LEFT JOIN.

    The Tags column can be empty, therefore the functionality of LEFT JOIN is needed and , does not work directly. There is dirty workaround for that: Duplicate the table with unnest([1,2]) as temp01. In case for temp01=1 we unnest the tags. For temp01=2 a dummy json is added to the dataset [json '{"x":-9877}']. In the array_agg only the temp01=1 tags are keept.

    CREATE OR REPLACE MATERIALIZED VIEW `Test.ajsonview` as
     SELECT 
        published_at, 
        event_id,
     ARRAY_AGG(if(temp01=1,`tags_unnest`.`x`,null) IGNORE NULLS) AS `tags`,
    FROM Test.ajson,
      unnest([1,2]) as temp01, 
      UNNEST(if(1=temp01,JSON_QUERY_ARRAY(`data`.`tags`),[json '{"x":-9877}'])) tags_unnest
    WHERE published_at>0
    group by 1,2