google-bigquerydml

Scheduled queries and clustering


It does not seem to be possible to schedule Queries in BigQuery that write to time partitioned and clustered target tables (using WRITE_TRUNCATE and a partition decorator): we are getting the error message:

Invalid value: Incompatible table partitioning specification. Expects partitioning specification interval(type:day) clustering(siteId,channelId), but input partitioning specification is interval(type:day)

I don't understand why this is happening, isn't the clustering specification just a part of the table definition? We also don't need to specify anything extra when performing dml inserts data in an already clustered table. Or is this related to us not using DML in the scheduled query?

EDIT: The scheduled query structure looks like this:

SELECT
  reportDate,
  channelId,
  siteId,
  pageType,
  [MORE_COLUMNS_SELECT),
  SUM(timeOnPage) AS timeOnPage_agg,
  ARRAY_AGG(STRUCT( sessionId,
      [MORE_COLUMNS_NESTED)
 ) AS Details
  ----
FROM `project.dataset.viewname` 
WHERE    reportDate >= TIMESTAMP_TRUNC(TIMESTAMP_ADD(@run_time, INTERVAL -1 DAY), DAY)
     AND reportDate < TIMESTAMP_TRUNC(@run_time, DAY)
GROUP BY
    reportDate,
  channelId,
  siteId,
  pageType,
  [MORE_COLUMNS_SELECT)

I am writing the results of this query to the target table like this: TARGET_TABLE_NAME${run_time-24h|"%Y%m%d"}

That table is time partitioned on _PARTITIONTIME (= Reportdate) and clustered on siteId, channelId


Solution

  • As of 23-10-2018, it seems that the BigQuery scheduled query functionality does NOT support the WRITE_TRUNCATE loading pattern in combination with clustering.

    What DOES work however, is writing to a clustered target table using a DML statement.