I am trying to distribute an integer into a column of integers, so that each row end up with values that are as close to each other as possible.
For example, given a table with columns Id and Bin, with pre-existing values:
(1, 3),
(2, 12),
(3, 6)
And @IntegerToDistribute = 7, the result I am imagining is that the row with ID 1 to get 5, and the row with ID 3 to get 2, so that the table would end up with:
(1, 8),
(2, 12),
(3, 8)
And if @IntegerToDistribute = 20, then ID 1 would have gotten 11, ID 3 had gotten 8, and ID 3 had gotten 2, to give the result
(1, 15),
(2, 14),
(3, 14)
Any solutions, or recommendations as how I should approach it?
I could obviously do this very easily as a cursor / while-loop (adding +1 at a time to the row with the smallest Bin value), but that will be very inefficient at bigger sizes of @IntegerToDistribute - so I am looking for a set based solution. (Start of edit) Also preferably avoidable due to code standards in the project I am in. I made a cursor solution previously which I ended up not going for due to this, that looks like this:
WHILE @AmountToAdd > 0
BEGIN
WITH
USR
( UserIdMember,
PreExistingAmount,
ToAdd
) AS
( SELECT TOP(1)
USR.UserIdMember,
USR.PreExistingAmount,
USR.ToAdd
FROM
#Users USR
ORDER BY
CH.PreExistingAmount + ToAdd
)
UPDATE USR
SET
USR.ToAdd= USR.ToAdd+ 1
FROM
USR
WHERE
1=1 -- condition in CTE;
SELECT
@AmountToAdd= @AmountToAdd-1;
END;
So, the solution is intended to replicate the result from the above, but not be an iterative solution. (end of edit)
I've tried with NTILE which doesn't seem to really work for this situation, and recursive CTE I cannot really wrap my head around how it would be done for this case.
Yes, This can be done using set-based logic using the following steps:
FullFillLevel
and the number of residual items ResidualItems
that need to be distributed at that full-fill-level + 1.RowNum
.TargetFillLevel
for each selected bin as follows: For bins with RowNum
<= ResidualItems
, the updated value will be FullFillLevel + 1
. For all others, the updated value would just be FullFillLevel
.FillLevel < TargetFillLevel
.Assuming a table structure like:
CREATE TABLE Bins (
BinId INT,
FillLevel INT
);
The following stored procedure will update the bin fill levels:
CREATE PROCEDURE AddToBins @NumItems INT
AS
DECLARE @MaxInt INT = 0x7FFFFFFF; -- Max INT value = 2^32 - 1 (proxy for infinity)
DECLARE @FullFillLevel INT;
DECLARE @ResidualItems INT;
WITH BinCountsByFillLevel AS (
SELECT
SUM(COUNT(*)) OVER(ORDER BY FillLevel) AS AvailableBins,
FillLevel
FROM Bins
GROUP BY FillLevel
),
FillLevelRanges AS (
SELECT
*,
ISNULL(LEAD(FillLevel) OVER(ORDER BY FillLevel), @MaxInt) AS ToFillLevel
FROM BinCountsByFillLevel
),
FillLevelRangeCapacities AS (
SELECT
FLR.*,
C.RangeCapacity,
SUM(C.RangeCapacity) OVER(ORDER BY FLR.FillLevel)
- C.RangeCapacity
AS FromRangeCapacity,
SUM(C.RangeCapacity) OVER(ORDER BY FLR.FillLevel)
AS ToRangeCapacity
FROM FillLevelRanges FLR
CROSS APPLY (
SELECT CAST((FLR.ToFillLevel - FLR.FillLevel) AS BIGINT)
* FLR.AvailableBins
AS RangeCapacity
) C
),
TargetFillLevel AS (
SELECT
FC.*,
FL.FullFillLevel,
FL.ResidualItems
FROM FillLevelRangeCapacities FC
CROSS APPLY (
SELECT
(@NumItems - FC.FromRangeCapacity) / FC.AvailableBins + FC.FillLevel
AS FullFillLevel,
(@NumItems - FC.FromRangeCapacity) % FC.AvailableBins
AS ResidualItems
) FL
WHERE FromRangeCapacity < @NumItems
AND ToRangeCapacity >= @NumItems
)
SELECT @FullFillLevel = FullFillLevel, @ResidualItems = ResidualItems
FROM TargetFillLevel
;
-- Debug print
SELECT @NumItems AS NumItems, @FullFillLevel AS FullFillLevel, @ResidualItems AS ResidualItems;
WITH OrderedBins AS (
SELECT *, ROW_NUMBER() OVER(ORDER BY BinId) RN
FROM Bins
WHERE FillLevel <= @FullFillLevel
)
UPDATE OB
SET FillLevel = T.TargetFillLevel
FROM OrderedBins OB
CROSS APPLY (
SELECT @FullFillLevel
+ CASE WHEN RN <= @ResidualItems THEN 1 ELSE 0 END
AS TargetFillLevel
) T
WHERE OB.FillLevel < T.TargetFillLevel
;
There may be some ways to consolidate some the above logic, but it works in its current form.
Results (with extra bins and extra test cases):
BinId | Initial FillLevel |
+7 | +20 | +21 | +22 | +23 | +52 | +999 | +2147483647 Max INT value |
---|---|---|---|---|---|---|---|---|---|
1 | 3 | 8 | 14 | 14 | 15 | 15 | 23 | 181 | 357913956 |
2 | 12 | 12 | 14 | 14 | 14 | 15 | 23 | 181 | 357913956 |
3 | 6 | 8 | 13 | 14 | 14 | 14 | 23 | 181 | 357913956 |
4 | 20 | 20 | 20 | 20 | 20 | 20 | 22 | 181 | 357913955 |
5 | 25 | 25 | 25 | 25 | 25 | 25 | 25 | 181 | 357913955 |
6 | 20 | 20 | 20 | 20 | 20 | 20 | 22 | 180 | 357913955 |
Total | 86 | 93 | 106 | 107 | 108 | 109 | 138 | 1085 | 2147483733 |
See this db<>fiddle for a demo.
Another explanation:
Consider an initial condition where we have six bins with fill levels of [3, 12, 6, 20, 25, 20]
. This can be visualized as follows:
Fill Range | Bin 1 | Bin 2 | Bin 3 | Bin 4 | Bin 5 | Bin 6 |
---|---|---|---|---|---|---|
25 to Inf | . | . | . | . | . | . |
20 to 25 | . | . | . | . | X | . |
12 to 20 | . | . | . | X | X | X |
6 to 12 | . | X | . | X | X | X |
3 to 6 | . | X | X | X | X | X |
0 to 3 | X | X | X | X | X | X |
Here the X
represents the fill and .
represents available capacity.
If we reorder the bins by ascending fill level, we can conceptually divide the available capacity into a series of rectangles whose width is the number of available bins at each level and the height is the fill capacity range for that set of bins.
Fill Range |
Bin 1 |
Bin 3 |
Bin 2 |
Bin 4 |
Bin 6 |
Bin 5 |
Available Bin Count |
Range Size |
Range Capacity (Rectangle) |
Cumulative Available Capacity |
---|---|---|---|---|---|---|---|---|---|---|
25 to Inf | <-- | --- | --- | --- | --- | --> | 6 | Inf | 6 * Inf = Inf | 64 to Inf |
20 to 25 | <-- | --- | --- | --- | --> | X | 5 | 5 | 5 * 5 = 25 | 39 to 64 |
12 to 20 | <-- | --- | --> | X | X | X | 3 | 8 | 3 * 8 = 24 | 15 to 39 |
6 to 12 | <-- | --> | X | X | X | X | 2 | 6 | 2 * 6 = 12 | 3 to 15 |
3 to 6 | <-> | X | X | X | X | X | 1 | 3 | 1 * 3 = 3 | 0 to 3 |
0 to 3 | X | X | X | X | X | X | 0 | 3 | 0 * 3 = 0 | 0 |
(SO table syntax does not support me showing the merged cells above, but consider a horizontal series of <-...->
cells to be merged into a rectangle.)
Now given the number of items to add, we need to pick the row that has a Cumulative Available Capacity that includes that value. All lower ranges will be fully filled and the selected range will be partially filled. It is then a matter of calculating how much to allocate within the selected range. Some bins might receive one more item than the others.
Example 1: If adding 7
items, we will select the 6 to 12
fill level range because 7
falls into the 3 to 15
Cumulative Available Capacity range. If we subtract the lower value of the capacity range 3
from the number of items 7
, this leaves is with 4
items to allocate at this level. The number of available bins at this level is 2
, so we can allocate 4 / 2 = 2
items each across all two available bins. Adding that to the start of that fill range '6', gives us a new fill level of 8
for those two bins.
Example 2: If adding 20
items, we will select the 12 to 20
fill level range because 20
falls into the 15 to 39
Cumulative Available Capacity range. If we subtract the lower value of the capacity range 15
from the number of items 20
, this leaves is with 5
items to allocate at this level. The number of available bins at this level is 3
, so we can allocate 5 / 3 = 1
item each across all three available bins with 2
residual items. Adding that to the start of that fill range '12', gives us a new fill level of 13
for those three bins. The 2
residual items increases the fill level to 14
for two of those bins.