I wrote the following SQL query to find out orders whose maximum quantity is strictly greater than the average quantity of every order (including itself). One order consists of multiple products bought, and hence a quantity corresponding to each product. Refer the table in attached image
SQL code:
SELECT DISTINCT
t.order_id
FROM
(SELECT
order_id,
MAX(quantity) OVER (PARTITION BY order_id) AS max_qty,
AVG(quantity) OVER (PARTITION BY order_id) AS avg_qty,
MAX(AVG(quantity) OVER (PARTITION BY order_id)) OVER() AS max_avg_qty
FROM
ordersdetails) AS t
WHERE
max_qty > max_avg_qty
ORDER BY
order_id
It throws an error:
Analytic function cannot be an argument of another analytic function
What does this mean?
I tried to get the answer using window functions. Using the window function:
avg(quantity) over(partition by order_id) as avg_qty,
I calculated average quantity bought for every order, then I put this
inside another window function max() over()
in order to get maximum value among all the average quantities. But it didn't work.
You have a couple of operations that are stepping on each other's feet, from an order-of-operations standpoint.
There are three ways in Bigquery sql to filter a result set. In order of their execution:
WHERE
clause: This step is executed very early in query execution, often as a first step when selecting data from disk. This can't filter data that results from an aggregation either through a GROUP BY
or a Window Function.HAVING
clause: This step is executed after GROUP BY
aggregation and can be used to filter the result set based on that results of the GROUP BY
aggregation but can't be used to filter the result of a window function.QUALIFY
clause: This step is executed at the very end of execution after Window Functions have been run against the result set.Because you want to filter based on the results of window functions, a QUALIFY
clause is the route you want to aim for.
To further complicate things, you are wanting to run a window function against data that is determined by another window function. This simply can't happen without a subquery as the step is only ran once per select.
Accounting for order of operations and limitations in step/processing that you have to deal with, your sql will look like:
SELECT DISTINCT t.order_id
FROM
(SELECT
order_id,
MAX(quantity) OVER (PARTITION BY order_id) AS max_qty,
AVG(quantity) OVER (PARTITION BY order_id) AS avg_qty
FROM
ordersdetails) AS t
QUALIFY max_qty > MAX(avg_qty) OVER ()
ORDER BY order_id