I have Firestore database with an installed Google BigQuery extension, so all our data ends up in Google's BigQuery.
the structure of our 'raw_changelog' table looks like this.
my queation is related to the data
column (which is of type JSON) and the fact I cannot seem to build a Materialized View (MV) on top it.
I'd like to create the following MV;
CREATE OR REPLACE MATERIALIZED VIEW `XXX.XXX.v_data`
PARTITION BY DATE(`published_at`)
CLUSTER BY `event_id`
OPTIONS (
enable_refresh=true,
refresh_interval_minutes=1
)
AS
SELECT
`published_at`,
`event_id`,
ARRAY_AGG(`tags_unnest`.`name` IGNORE NULLS) AS `tags`
FROM `XXX.XXX.XXX_raw_latest` AS `o`
LEFT JOIN UNNEST(JSON_QUERY_ARRAY(`data`.`tags`)) AS `tags_unnest`
WHERE
DATE(`o`.`published_at`) > '2021-01-01'
GROUP BY
`published_at`,
`event_id`
However I'm getting the following error;
I found at that the error is related to line;
ARRAY_AGG(`tags_unnest`.`name` IGNORE NULLS) AS `tags`
I verified that tags_unnest
is of type object (coming from the JSON data
field) and it seems the issue comess from the fact tags_unnest
is an object I guess.
if I replace the above line with;
ARRAY_AGG('aa' IGNORE NULLS) AS `tags`
OR
ARRAY_AGG(`published_at` IGNORE NULLS) AS `tags`
then it works fine and I can go ahead create the MV.
Perhaps also good to note; in the source table, if I change the type of the data
field to 'STRING' then my original create-MV-query also runs without any issues..
Can anyone tell me why I cannot create the MV and possibly also a way around my problem?
There is a workaround for the Materialized View: Please replace the LEFT JOIN
by a ,
For a Materialized View (MV) there is a limit amount of calculations allowed.
The reference says that Left/right/full outer joins are not allowed in MV and e.g. only one GROUP BY
clause operation is allowed.
Your query contains the following 4 task operations:
unnest
-ing an json array, containing an objectleft join
.name
or .x
) andarray_agg
.If you use one less operation, it should work. Replace the left join
by a ,
Lets build a dummy table:
CREATE OR REPLACE TABLE
Test.ajson AS
SELECT
1 published_at, 1 event_id, json ' {"a":[1],"tags":[{"x":1}]} ' AS DATA, [1,2] arr
UNION ALL SELECT 2, 2, json ' {"a":[2],"tags":[{"x":2},{"x":20}]} ', []
UNION ALL SELECT 1, 1, json ' {"tags":[{"x":10}]} ', []
UNION ALL SELECT 5,5,json "{}", []
Now build the MV:
CREATE OR REPLACE MATERIALIZED VIEW `Test.ajsonview` as
SELECT
published_at,
event_id,
ARRAY_AGG(`tags_unnest`.`x` IGNORE NULLS) AS `tags`,
# ARRAY_AGG(`tags_unnest` IGNORE NULLS) AS testing
FROM Test.ajson
, # works
#left join # throws error: Incremental materialized view query contains unsupported feature.
UNNEST(JSON_QUERY_ARRAY(`data`.`tags`)) tags_unnest
WHERE published_at>0
group by 1,2
Note that the ARRAY_AGG
without the .x
:
ARRAY_AGG(`tags_unnest` IGNORE NULLS) AS testing
is working in combination with LEFT JOIN
.
The Tags
column can be empty, therefore the functionality of LEFT JOIN
is needed and ,
does not work directly. There is dirty workaround for that: Duplicate the table with unnest([1,2]) as temp01
. In case for temp01=1
we unnest the tags
. For temp01=2
a dummy json is added to the dataset [json '{"x":-9877}']
. In the array_agg
only the temp01=1
tags are keept.
CREATE OR REPLACE MATERIALIZED VIEW `Test.ajsonview` as
SELECT
published_at,
event_id,
ARRAY_AGG(if(temp01=1,`tags_unnest`.`x`,null) IGNORE NULLS) AS `tags`,
FROM Test.ajson,
unnest([1,2]) as temp01,
UNNEST(if(1=temp01,JSON_QUERY_ARRAY(`data`.`tags`),[json '{"x":-9877}'])) tags_unnest
WHERE published_at>0
group by 1,2