sqlgoogle-cloud-platformgoogle-bigquery

BigQuery select __TABLES__ from all tables within project?


Using BigQuery, is there a way I can select __TABLES__ from every dataset within my project? I've tried SELECT * FROM '*.__TABLES' but that is not allowed within BigQuery. Any help would be great, thanks!


Solution

  • __TABLES__ syntax is supported only for specific dataset and does not work across datasets

    What you can do is something as below

    #standardSQL
    WITH ALL__TABLES__ AS (
      SELECT * FROM `bigquery-public-data.1000_genomes.__TABLES__` UNION ALL
      SELECT * FROM `bigquery-public-data.baseball.__TABLES__` UNION ALL
      SELECT * FROM `bigquery-public-data.bls.__TABLES__` UNION ALL
      SELECT * FROM `bigquery-public-data.census_bureau_usa.__TABLES__` UNION ALL
      SELECT * FROM `bigquery-public-data.cloud_storage_geo_index.__TABLES__` UNION ALL
      SELECT * FROM `bigquery-public-data.cms_codes.__TABLES__` UNION ALL
      SELECT * FROM `bigquery-public-data.common_us.__TABLES__` UNION ALL
      SELECT * FROM `bigquery-public-data.fec.__TABLES__` UNION ALL
      SELECT * FROM `bigquery-public-data.genomics_cannabis.__TABLES__` UNION ALL
      SELECT * FROM `bigquery-public-data.ghcn_d.__TABLES__` UNION ALL
      SELECT * FROM `bigquery-public-data.ghcn_m.__TABLES__` UNION ALL
      SELECT * FROM `bigquery-public-data.github_repos.__TABLES__` UNION ALL
      SELECT * FROM `bigquery-public-data.hacker_news.__TABLES__` UNION ALL
      SELECT * FROM `bigquery-public-data.irs_990.__TABLES__` UNION ALL
      SELECT * FROM `bigquery-public-data.medicare.__TABLES__` UNION ALL
      SELECT * FROM `bigquery-public-data.new_york.__TABLES__` UNION ALL
      SELECT * FROM `bigquery-public-data.nlm_rxnorm.__TABLES__` UNION ALL
      SELECT * FROM `bigquery-public-data.noaa_gsod.__TABLES__` UNION ALL
      SELECT * FROM `bigquery-public-data.open_images.__TABLES__` UNION ALL
      SELECT * FROM `bigquery-public-data.samples.__TABLES__` UNION ALL
      SELECT * FROM `bigquery-public-data.san_francisco.__TABLES__` UNION ALL
      SELECT * FROM `bigquery-public-data.stackoverflow.__TABLES__` UNION ALL
      SELECT * FROM `bigquery-public-data.usa_names.__TABLES__` UNION ALL
      SELECT * FROM `bigquery-public-data.utility_us.__TABLES__` 
    )
    SELECT *
    FROM ALL__TABLES__
    

    In this case you need to know in advance list of datasets, which you can easily do via Datasets: list API or using respective bq ls

    Please note: above approach will work only for datasets with data in same location. If you have datasets with data in different locations you will need to query them in two different queries

    For example:

    #standardSQL
    WITH ALL_EU__TABLES__ AS (
      SELECT * FROM `bigquery-public-data.common_eu.__TABLES__` UNION ALL
      SELECT * FROM `bigquery-public-data.utility_eu.__TABLES__` 
    )
    SELECT *
    FROM ALL_EU__TABLES__