sqlgoogle-bigquery

How to Calculate Percentage of total in BigQuery using Window Function (PARTITION BY expression references ..which is neither grouped nor aggregated)


I'm trying to calculate the percentage of total events delivered that were attributed to to each user for a given day in BigQuery.

The formula I want is: User events / Total events * 100

This is my query with window function:

SELECT 
  TIMESTAMP_TRUNC(`dataset.table`.event_time, DAY) as date,
  `dataset.table`.user_id AS user_id,
  SAFE_DIVIDE(
    SUM(`dataset.table`.event_count),
    SUM(SUM(`dataset.table`.event_count)) OVER (PARTITION BY TIMESTAMP_TRUNC(`dataset.table`.event_time, DAY))
  ) * 100 AS event_percentage,
  SUM(`dataset.table`.event_count) AS event_share
FROM `dataset.table`
WHERE TIMESTAMP_TRUNC(`dataset.table`.event_time, DAY) = '2025-05-01'
GROUP BY TIMESTAMP_TRUNC(`dataset.table`.event_time, DAY), 
         `dataset.table`.user_id;

But I get this error:

PARTITION BY expression references `scoping_data_model.aggregates`.event_time which is neither grouped nor aggregated

How can I correctly calculate the percentage of daily events attributed to each user using windows function in BigQuery? Is it even possible or I have to use subquery?


Solution

  • Effectively Google BigQuery is lost understanding that you're using the same TIMESTAMP_TRUNC(`dataset.table`.event_time, DAY) everywhere, while it works perfectly in PostgreSQL for example.

    So your only option is to first compute it as a conventional column, then only use that column.
    Additionally you'll gain clarity (removing those repetitions of TIMESTAMP_TRUNC(…)).

    WITH t AS (
      SELECT *, 
           TIMESTAMP_TRUNC(`dataset.table`.event_time, DAY) date 
      FROM `dataset.table` 
     /* WHERE date = '2025-05-01' for your test */
    )
    SELECT
      date,
      user_id AS user_id,
      SAFE_DIVIDE(
        SUM(event_count),
        SUM(SUM(event_count)) OVER (PARTITION BY date)
      ) * 100 AS event_percentage,
      SUM(event_count) AS event_share
    FROM t 
    GROUP BY date, 
             t.user_id;
    

    Or alternatively you could even GROUP BY as early as in the CTE, to avoid the repetition of SUM(event_count) too:

    WITH t AS (
       SELECT TIMESTAMP_TRUNC(event_time, DAY) as date, 
          user_id, SUM(event_count) event_count 
       FROM `dataset.table` 
       /* WHERE date = '2025-05-01' for your test */ 
       GROUP BY 1, 2
    )
    SELECT
      date,
      user_id AS user_id,
      SAFE_DIVIDE(
        event_count,
        SUM(event_count) OVER (PARTITION BY date)
      ) * 100 AS event_percentage,
      event_count AS event_share
    FROM t;