I'm trying to bucket my data into deciles, but not in the traditional sense where the dimension is the basis of the decile.
I have 463 unique it_scores ranging from 316-900 (my dimension) with 1,296,070 trade_counts (my measure) total. Using the following code breaks my data into 10 buckets with 47 unique it_scores:
ntile(10) over (order by it_score)) as tileno
While this is definitely doing what it's supposed to, I need my buckets to be built on the basis of total trade_counts, with each bucket containing about 129.6k observations. The it_score is still the dimension but the ranges wouldn't necessarily be equal i.e. decile 10 might have a range of 316-688 with 129.6k observations while decile 9 might be 689-712 also with 129.6k observations.
How would I achieve that?
SUM(trade_count) OVER (ORDER BY it_score)
to assign deciles based on cumulative trade_counts
.
SELECT
decile,
SUM(trade_count) AS decile_trade_count
FROM
(
SELECT
it_score,
trade_count,
FLOOR(
(SUM(trade_count) OVER (ORDER BY it_score) - 1) / (SUM(trade_count) OVER ()) * 10
) + 1 AS decile
FROM table
) sub
GROUP BY decile
ORDER BY decile;