google-bigquerymaterialized-views

huge discrepancy in bigquery materialized view logical size and processing size


I am relatively new in bigquery. What I understood about materialized view was that it precomputes from source on every refresh intervals provided and saves time when querying. I assumed that the process size will be based on logical bytes size.

In a project, I see that the main table is showing ~10GB logical bytes. The materialized view created out of it with auto refresh (of every 6 hours) enabled is showing ~9GB of logical bytes. When I tried to query the materialized view with select * from dataset.materialized_view_table it says "This query will process ~18 GB when run."

How is this 18GB of processing possible when the source and materialized view both are ~10GB and ~9GB? And the actual query used to build materialized view also shows "This query will process ~10 GB when run".

If I try to run the materialized view second time, will it show that it will process 18GB when run? I am not allowed to run it. But I am curious.

Please shed me some light on this.


Solution

  • The estimation of a query cost happens before the query is actually run and during that time, only the referenced fields are known and their (column-wise) sum is calculated and is shown to the user.

    Since the data distribution for the requested job is not known beforehand, the estimation is essentially the upper bound of data, the query would have to scan.

    Partitioning and clustering your tables can help reduce the amount of data processed by queries and hence can reduce the number of scanned bytes.

    It can also depend upon the type of pricing model you are using as per this document such as on-demand pricing or capacity based pricing.

    For the detailed investigation of your issue, I would suggest you to create a new GCP support case, if you have a support plan. Otherwise, you can open a new issue on the issue tracker describing your issue.