sqlgoogle-bigquerysetlogicset-theory

SQL, Select Nth quintile for several categorical values


Hey StackOverflow SQL and set theory gurus!

Take dummy table below. For each fruit type, I am looking to find lowest value in the 4th quintile.

Fruit |Size|Fruit_ID|
Apple |10  |1       |
Apple |12  |2       |
Apple |11  |3       |
Pear  |13  |4       |
Pear  |12  |5       |
Pear  |11  |6       |
Pear  |10  |7       |
Apple |15  |8       |

My current approach is

WITH quantiles AS (
SELECT
 NTILE(4) OVER(ORDER BY Size) as quantile,
 Fruit, 
 Size
FROM
 DUMMY_TABLE
),

SELECT
 MIN(Size),
 Fruit
FROM
 quantiles
WHERE
 quantile = 4
GROUP BY
 Fruit

This approach of course gives me the lowest value in 4th quantile for all fruits instead of one for each fruit type.

Any guidance on how I can adapt the above query to calculate quantiles for each fruit type instead of for all fruits?

I am working in Google's BigQuery.

Thank you kindly!


Solution

  • I can see that you are ordering by sales in the Ntile windows function but there is no sales column in the dummy data provided in the question. Assuming this is what you want to do, I have included a dummy sales column and partitioned the Ntile function by Fruit.

    WITH quantiles AS (
    SELECT
     NTILE(4) OVER(PARTITION BY Fruit ORDER BY sales) AS quantile,
     Fruit, 
     Size
    FROM DUMMY_TABLE
    )
    SELECT
     MIN(Size) AS Lowest_Value,
     Fruit
    FROM quantiles
    WHERE quantile = 4
    GROUP BY Fruit;
    

    See Demo

    Result

    Lowest_Value |Fruit|
        10       |Apple|
        10       |Pear |