sqlhivepercentile

Creating deciles in SQL


I'm trying to bucket my data into deciles, but not in the traditional sense where the dimension is the basis of the decile.

I have 463 unique it_scores ranging from 316-900 (my dimension) with 1,296,070 trade_counts (my measure) total. Using the following code breaks my data into 10 buckets with 47 unique it_scores:

ntile(10) over (order by it_score)) as tileno

While this is definitely doing what it's supposed to, I need my buckets to be built on the basis of total trade_counts, with each bucket containing about 129.6k observations. The it_score is still the dimension but the ranges wouldn't necessarily be equal i.e. decile 10 might have a range of 316-688 with 129.6k observations while decile 9 might be 689-712 also with 129.6k observations.

How would I achieve that?


Solution

  • SUM(trade_count) OVER (ORDER BY it_score) to assign deciles based on cumulative trade_counts.

    SELECT
      decile,
      SUM(trade_count) AS decile_trade_count
    FROM
      (
        SELECT
          it_score,
          trade_count,
          FLOOR(
            (SUM(trade_count) OVER (ORDER BY it_score) - 1) / (SUM(trade_count) OVER ()) * 10
          ) + 1 AS decile
        FROM table
      ) sub
    GROUP BY decile
    ORDER BY decile;