What would be the best way to bucket records into batches of a predefined size? I would like to tag each record with a batch/bucket number for further processing.
For example, let's say I have 1110 records and the batch/bucket size is 200, I should end up with either:
The background/context is for each batch to then be processed by an external service, which has a maximum allowed batch size (which varies per service).
I figured out a way to do it using the NTILE window functions:
WITH records AS (
SELECT RANDOM() AS value
FROM TABLE (generator(rowcount => 1110))
),
batches AS (
SELECT
value,
NTILE(
(SELECT COUNT(*) FROM records)/200
) OVER (
ORDER BY NULL
) AS batch
FROM records
)
SELECT batch, COUNT(*)
FROM batches
GROUP BY ALL;
Result:
My first approach was to use ROW_NUMBER(), but there's a bit more overhead with that approach (although the result is the same):
WITH records AS (
SELECT RANDOM() AS value
FROM TABLE (generator(rowcount => 1110))
),
batches AS (
SELECT
value,
ROW_NUMBER() OVER (
ORDER BY NULL
) AS rn,
(rn % CEIL((SELECT COUNT(*) FROM records)/200)) AS batch
FROM records
)
SELECT batch, COUNT(*)
FROM batches
GROUP BY ALL;