google-cloud-platformgoogle-bigquery

Table monitoring in Bigquery


I have a set of CRITICAL tables in BigQuery that are getting loaded hourly by DAGS.

I have been tasked to develop a standalone solution to check the following:

  1. Are the tables present ?[ There are chances that the tables may get deleted by operations team]

  2. if The table is present, is the table getting loaded on time ?

  3. If the table is getting loaded, is there a difference in the size during consecutive runs[ The table is expected to increase in size]

If any of the above checks fails, the operations team has to be notified as soon as possible.

Can someone suggest a solution( probably a service or list of services) for the above requirement ?


Solution

  • Part one: Building queries:

    1. The existing of a table can be checked by table schema
    2. You need to query every table and look for changes
    3. The storage size for the last days is here INFORMATION_SCHEMA.TABLE_STORAGE_USAGE_TIMELINE
    Select *
    from `region-us`.INFORMATION_SCHEMA.TABLE_STORAGE_USAGE_TIMELINE
    

    For the alerting use one the following: