c++algorithm

Divide an integer range into nearly equal integer ranges


I am writing an algorithm that involves taking a set of numbers and putting them into buckets. Can you help me implement these two simple methods? Let me know if I need to explain more.

// return a vector where each element represents the
// size of the range of numbers in the corresponding bucket
// buckets should be equal in size +/- 1
// doesn't matter where the bigger/smaller buckets are
vector<int> makeBuckets(int max, int numberOfBuckets);

// return which bucket n belongs in
int whichBucket(int max, int numberOfBuckets, int n); 

Example output

makeBuckets(10, 3) == { 3, 3, 4 }; // bucket ranges: (0, 2), (3, 5), (6, 9)
whichBucket(10, 3, 0) == 0;
whichBucket(10, 3, 1) == 0;
whichBucket(10, 3, 2) == 0;
whichBucket(10, 3, 3) == 1;
whichBucket(10, 3, 4) == 1;
whichBucket(10, 3, 5) == 1;
whichBucket(10, 3, 6) == 2;
whichBucket(10, 3, 7) == 2;
whichBucket(10, 3, 8) == 2;
whichBucket(10, 3, 9) == 2;

Solution

  • Let's say you need to divide a range [0,n] into k buckets.

    1. e = n / k (integer divide!) will tell you the minimal size of each bucket.
    2. o = n % k will tell you how many buckets need to grow in size.
    3. Now loop over k:
      • If o > 0, create a bucket of size e+1, decrease o.
      • If o == 0, create a bucket of size e.

    How to best create buckets depends on the size of n. For example, if you have a small n, you could just have an array of size n that stores the bucket index for each number. In the loop above, you would fill up that array. Then the query whichBucket() would run in O(1).

    If n is large, however, this is impractical. In this case, you would do your bucket sorting completely implicitely. That means, for each incoming query, you can directly compute the corresponding bucket index using e and o.