sqlsql-servert-sql

Distribute value over rows in a column based on pre-existing value


I am trying to distribute an integer into a column of integers, so that each row end up with values that are as close to each other as possible.

For example, given a table with columns Id and Bin, with pre-existing values:

(1, 3),
(2, 12),
(3, 6)

And @IntegerToDistribute = 7, the result I am imagining is that the row with ID 1 to get 5, and the row with ID 3 to get 2, so that the table would end up with:

(1, 8),
(2, 12),
(3, 8)

And if @IntegerToDistribute = 20, then ID 1 would have gotten 11, ID 3 had gotten 8, and ID 3 had gotten 2, to give the result

(1, 15),
(2, 14),
(3, 14)

Any solutions, or recommendations as how I should approach it?

I could obviously do this very easily as a cursor / while-loop (adding +1 at a time to the row with the smallest Bin value), but that will be very inefficient at bigger sizes of @IntegerToDistribute - so I am looking for a set based solution. (Start of edit) Also preferably avoidable due to code standards in the project I am in. I made a cursor solution previously which I ended up not going for due to this, that looks like this:

WHILE @AmountToAdd > 0 
BEGIN   
   WITH     
   USR  
   (    UserIdMember,       
        PreExistingAmount,
        ToAdd
    ) AS 
    (   SELECT TOP(1)
            USR.UserIdMember,
            USR.PreExistingAmount,
            USR.ToAdd
        FROM
            #Users USR
        ORDER BY
            CH.PreExistingAmount + ToAdd
    )
    UPDATE USR
    SET
        USR.ToAdd= USR.ToAdd+ 1
    FROM
        USR
    WHERE
            1=1 -- condition in CTE;
    SELECT
        @AmountToAdd= @AmountToAdd-1; 
END;

So, the solution is intended to replicate the result from the above, but not be an iterative solution. (end of edit)

I've tried with NTILE which doesn't seem to really work for this situation, and recursive CTE I cannot really wrap my head around how it would be done for this case.


Solution

  • Yes, This can be done using set-based logic using the following steps:

    1. Group and count the current bins by their current fill level.
    2. With the results of the prior step ordered by fill level, calculate (a) the cumulative number of bins open at each fill level, (b) the distance difference successive fill levels, (c) the available capacity of each fill level range (bin count * range size), (d) the cumulative capacity across each fill level range.
    3. Select the range that can accommodate the number of items to be added.
    4. Based on the lower limit of that range and the number of bins available in that range, calculate (a) the target full fill level FullFillLevel and the number of residual items ResidualItems that need to be distributed at that full-fill-level + 1.
    5. Select all bins with a current fill level <= the calculated target full fill level and assign them a number RowNum.
    6. Calculate an updated fill level TargetFillLevel for each selected bin as follows: For bins with RowNum <= ResidualItems, the updated value will be FullFillLevel + 1. For all others, the updated value would just be FullFillLevel.
    7. Apply updates for all rows where FillLevel < TargetFillLevel.

    Assuming a table structure like:

    CREATE TABLE Bins (
        BinId INT,
        FillLevel INT
    );
    

    The following stored procedure will update the bin fill levels:

    CREATE PROCEDURE AddToBins @NumItems INT
    AS
    
    DECLARE @MaxInt INT = 0x7FFFFFFF;  -- Max INT value = 2^32 - 1 (proxy for infinity)
    
    DECLARE @FullFillLevel INT;
    DECLARE @ResidualItems INT;
    
    WITH BinCountsByFillLevel AS (
        SELECT
            SUM(COUNT(*)) OVER(ORDER BY FillLevel) AS AvailableBins,
            FillLevel
        FROM Bins
        GROUP BY FillLevel
    ),
    FillLevelRanges AS (
        SELECT
            *,
            ISNULL(LEAD(FillLevel) OVER(ORDER BY FillLevel), @MaxInt) AS ToFillLevel
        FROM BinCountsByFillLevel
    ),
    FillLevelRangeCapacities AS (
        SELECT
            FLR.*,
            C.RangeCapacity,
            SUM(C.RangeCapacity) OVER(ORDER BY FLR.FillLevel)
                - C.RangeCapacity
                AS FromRangeCapacity,
            SUM(C.RangeCapacity) OVER(ORDER BY FLR.FillLevel)
                AS ToRangeCapacity
        FROM FillLevelRanges FLR
        CROSS APPLY (
            SELECT CAST((FLR.ToFillLevel - FLR.FillLevel) AS BIGINT)
                   * FLR.AvailableBins
                   AS RangeCapacity
        ) C
    ),
    TargetFillLevel AS (
        SELECT
            FC.*,
            FL.FullFillLevel,
            FL.ResidualItems
        FROM FillLevelRangeCapacities FC
        CROSS APPLY (
            SELECT
                (@NumItems - FC.FromRangeCapacity) / FC.AvailableBins + FC.FillLevel
                    AS FullFillLevel,
                (@NumItems - FC.FromRangeCapacity) % FC.AvailableBins
                    AS ResidualItems
        ) FL
        WHERE FromRangeCapacity < @NumItems
        AND ToRangeCapacity >= @NumItems
    )
    SELECT @FullFillLevel = FullFillLevel, @ResidualItems = ResidualItems
    FROM TargetFillLevel
    ;
    
    -- Debug print
    SELECT @NumItems AS NumItems, @FullFillLevel AS FullFillLevel, @ResidualItems AS ResidualItems;
    
    WITH OrderedBins AS (
        SELECT *, ROW_NUMBER() OVER(ORDER BY BinId) RN
        FROM Bins
        WHERE FillLevel <= @FullFillLevel
    )
    UPDATE OB
    SET FillLevel = T.TargetFillLevel
    FROM OrderedBins OB
    CROSS APPLY (
        SELECT @FullFillLevel
                   + CASE WHEN RN <= @ResidualItems THEN 1 ELSE 0 END
                   AS TargetFillLevel
    ) T
    WHERE OB.FillLevel < T.TargetFillLevel
    ;
    

    There may be some ways to consolidate some the above logic, but it works in its current form.

    Results (with extra bins and extra test cases):

    BinId Initial
    FillLevel
    +7 +20 +21 +22 +23 +52 +999 +2147483647
    Max INT value
    1 3 8 14 14 15 15 23 181 357913956
    2 12 12 14 14 14 15 23 181 357913956
    3 6 8 13 14 14 14 23 181 357913956
    4 20 20 20 20 20 20 22 181 357913955
    5 25 25 25 25 25 25 25 181 357913955
    6 20 20 20 20 20 20 22 180 357913955
    Total 86 93 106 107 108 109 138 1085 2147483733

    See this db<>fiddle for a demo.

    Another explanation:

    Consider an initial condition where we have six bins with fill levels of [3, 12, 6, 20, 25, 20]. This can be visualized as follows:

    Fill Range Bin 1 Bin 2 Bin 3 Bin 4 Bin 5 Bin 6
    25 to Inf . . . . . .
    20 to 25 . . . . X .
    12 to 20 . . . X X X
    6 to 12 . X . X X X
    3 to 6 . X X X X X
    0 to 3 X X X X X X

    Here the X represents the fill and . represents available capacity.

    If we reorder the bins by ascending fill level, we can conceptually divide the available capacity into a series of rectangles whose width is the number of available bins at each level and the height is the fill capacity range for that set of bins.

    Fill
    Range
    Bin
    1
    Bin
    3
    Bin
    2
    Bin
    4
    Bin
    6
    Bin
    5
    Available
    Bin
    Count
    Range
    Size
    Range
    Capacity
    (Rectangle)
    Cumulative
    Available
    Capacity
    25 to Inf <-- --- --- --- --- --> 6 Inf 6 * Inf = Inf 64 to Inf
    20 to 25 <-- --- --- --- --> X 5 5 5 * 5 = 25 39 to 64
    12 to 20 <-- --- --> X X X 3 8 3 * 8 = 24 15 to 39
    6 to 12 <-- --> X X X X 2 6 2 * 6 = 12 3 to 15
    3 to 6 <-> X X X X X 1 3 1 * 3 = 3 0 to 3
    0 to 3 X X X X X X 0 3 0 * 3 = 0 0

    (SO table syntax does not support me showing the merged cells above, but consider a horizontal series of <-...-> cells to be merged into a rectangle.)

    Now given the number of items to add, we need to pick the row that has a Cumulative Available Capacity that includes that value. All lower ranges will be fully filled and the selected range will be partially filled. It is then a matter of calculating how much to allocate within the selected range. Some bins might receive one more item than the others.

    Example 1: If adding 7 items, we will select the 6 to 12 fill level range because 7 falls into the 3 to 15 Cumulative Available Capacity range. If we subtract the lower value of the capacity range 3 from the number of items 7, this leaves is with 4 items to allocate at this level. The number of available bins at this level is 2, so we can allocate 4 / 2 = 2 items each across all two available bins. Adding that to the start of that fill range '6', gives us a new fill level of 8 for those two bins.

    Example 2: If adding 20 items, we will select the 12 to 20 fill level range because 20 falls into the 15 to 39 Cumulative Available Capacity range. If we subtract the lower value of the capacity range 15 from the number of items 20, this leaves is with 5 items to allocate at this level. The number of available bins at this level is 3, so we can allocate 5 / 3 = 1 item each across all three available bins with 2 residual items. Adding that to the start of that fill range '12', gives us a new fill level of 13 for those three bins. The 2 residual items increases the fill level to 14 for two of those bins.