pythonpython-polarspolars

distribute value to fill and unfill based on a given condition


The problem:

I want to distribute a value that can be positive or negative from one row into multiple rows, where each row can only contain a specific amount, if the value to distribute is positive it fills rows, if it is negative it "unfills". It is possible to fill/unfill partially a row and the order of filling do matter, the first empty or not totally full row is the first you fill, the last filled or partially filled is the first you unfill or partially unfill.

The expected behavior:

Here is an example of what I expect: first comes the table of payments where there will be a single record per acc and its respective val to distribute.

| acc |   val |
|-----|-------|
|   1 |  -100 |
|   2 |  -123 |
|   3 |   -75 |
|   4 |  -300 |
|   5 |   -77 |
|   6 |  -500 |
|   7 |   111 |
|   8 |   123 |
|   9 |   300 |
|  10 |    75 |

then here is the table of bills to fill or unfill:

| acc |  val |  sum |
|-----|------|------|
|   1 |  100 |  100 |
|   1 |  100 |   50 |
|   1 |  100 |    0 |
|   2 |  123 |  123 |
|   2 |  150 |    0 |
|   2 |  123 |    0 |
|   3 |   70 |   70 |
|   3 |   70 |    5 |
|   4 |  150 |  150 |
|   4 |  100 |  100 |
|   4 |  150 |  150 |
|   5 |   77 |   78 |
|   6 |  500 |  500 |
|   6 |  500 |  500 |
|   6 |  500 |  500 |
|   6 |  500 |  500 |
|   7 |  100 |   10 |
|   7 |  100 |    0 |
|   7 |  100 |    0 |
|   8 |  123 |    0 |
|   8 |  123 |    0 |
|   8 |  123 |    0 |
|   9 |  100 |    0 |
|   9 |  150 |    0 |
|   9 |  200 |    0 |
|  10 |   75 |   75 |
|  10 |   75 |    0 |

while this is my expected output:

| acc |  val |  sum |
|-----|------|------|
|   1 |  100 |   50 |
|   1 |  100 |    0 |
|   1 |  100 |    0 |
|   2 |  123 |    0 |
|   2 |  150 |    0 |
|   2 |  123 |    0 |
|   3 |   70 |    0 |
|   3 |   70 |    0 |
|   4 |  150 |  100 |
|   4 |  100 |    0 |
|   4 |  150 |    0 |
|   5 |   77 |    1 |
|   6 |  500 |  500 |
|   6 |  500 |  500 |
|   6 |  500 |  500 |
|   6 |  500 |    0 |
|   7 |  100 |  100 |
|   7 |  100 |   22 |
|   7 |  100 |    0 |
|   8 |  123 |  123 |
|   8 |  123 |    0 |
|   8 |  123 |    0 |
|   9 |  100 |  100 |
|   9 |  150 |  150 |
|   9 |  200 |   50 |
|  10 |   75 |   75 |
|  10 |   75 |   75 |

What I have tried:

this is my current approach but i have been struggling to get it to work, column res is where i am storing the expected sum

import polars as pl

pl.Config.set_tbl_rows(100)
pl.Config.set_tbl_cols(20)
    
left = pl.DataFrame({
    "acc": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
    "val": [-100, -123, -75, -300, -77, -500, 111, 123, 300, 75]
})
    
right = pl.DataFrame({
    "acc": [1, 1, 1, 2, 2, 2, 3, 3, 4, 4, 4, 5, 6, 6, 6, 6, 7, 7, 7, 8, 8, 8, 9, 9, 9, 10, 10],
    "val": [100, 100, 100, 123, 150, 123, 70, 70, 150, 100, 150, 77, 500, 500, 500, 500, 100, 100, 100, 123, 123, 123, 100, 150, 200, 10, 10],
    "sum": [100, 50, 0, 123, 0, 0, 70, 5, 150, 100, 150, 78, 500, 500, 500, 500, 10, 0, 0, 123, 0, 0, 0, 0, 0, 0, 0]
})
    
df = right.join(left.rename({'val': 'bill'}), on='acc')
    
bill = pl.col('bill')
available = pl.col('val').cum_sum().over('acc')
covered = pl.col('sum').cum_sum(reverse=True).over('acc')
    
val = pl.when(bill < 0).then(covered + bill).otherwise(available - bill)
    
results = df.with_columns(bill=bill, available=available, covered=covered, res=val)
    
print(results)

Solution

  • For each group defined by acc, we can compute the total value as the sum of all payments (sum column in right) and the new payment (val column in left) in the group.

    Given this total amount, we can distribute it across rows in the group. Particularly, we take the total value and subtract the amount that could've already been distributed across previous rows in the group (computed as used_expr).

    Clipping is used to ensure a row's value is always between 0 and val.

    For instructive purposes, I've also added the columns total and used to the dataframe below. However, these are optional to include and can be dropped by removing the first .with_columns(...)-block.

    total_expr = pl.col("sum").sum() + pl.col("val_new")
    used_expr = pl.col("val").cum_sum().shift(1).fill_null(0)
    
    (
        right
        .join(left, on="acc", suffix="_new")
        .with_columns(
            total_expr.over("acc").alias("total"),
            used_expr.over("acc").alias("used"),
        )
        .with_columns(
            (total_expr - used_expr).clip(0, "val").over("acc").alias("res")
        )
    )
    

    Note. The output differs from the expected outcome only in cases where the input dataframes don't match the tables shown in the question.

    shape: (27, 7)
    ┌─────┬─────┬─────┬─────────┬───────┬──────┬─────┐
    │ acc ┆ val ┆ sum ┆ val_new ┆ total ┆ used ┆ res │
    │ --- ┆ --- ┆ --- ┆ ---     ┆ ---   ┆ ---  ┆ --- │
    │ i64 ┆ i64 ┆ i64 ┆ i64     ┆ i64   ┆ i64  ┆ i64 │
    ╞═════╪═════╪═════╪═════════╪═══════╪══════╪═════╡
    │ 1   ┆ 100 ┆ 100 ┆ -100    ┆ 50    ┆ 0    ┆ 50  │
    │ 1   ┆ 100 ┆ 50  ┆ -100    ┆ 50    ┆ 100  ┆ 0   │
    │ 1   ┆ 100 ┆ 0   ┆ -100    ┆ 50    ┆ 200  ┆ 0   │
    │ 2   ┆ 123 ┆ 123 ┆ -123    ┆ 0     ┆ 0    ┆ 0   │
    │ 2   ┆ 150 ┆ 0   ┆ -123    ┆ 0     ┆ 123  ┆ 0   │
    │ 2   ┆ 123 ┆ 0   ┆ -123    ┆ 0     ┆ 273  ┆ 0   │
    │ 3   ┆ 70  ┆ 70  ┆ -75     ┆ 0     ┆ 0    ┆ 0   │
    │ 3   ┆ 70  ┆ 5   ┆ -75     ┆ 0     ┆ 70   ┆ 0   │
    │ 4   ┆ 150 ┆ 150 ┆ -300    ┆ 100   ┆ 0    ┆ 100 │
    │ 4   ┆ 100 ┆ 100 ┆ -300    ┆ 100   ┆ 150  ┆ 0   │
    │ 4   ┆ 150 ┆ 150 ┆ -300    ┆ 100   ┆ 250  ┆ 0   │
    │ 5   ┆ 77  ┆ 78  ┆ -77     ┆ 1     ┆ 0    ┆ 1   │
    │ 6   ┆ 500 ┆ 500 ┆ -500    ┆ 1500  ┆ 0    ┆ 500 │
    │ 6   ┆ 500 ┆ 500 ┆ -500    ┆ 1500  ┆ 500  ┆ 500 │
    │ 6   ┆ 500 ┆ 500 ┆ -500    ┆ 1500  ┆ 1000 ┆ 500 │
    │ 6   ┆ 500 ┆ 500 ┆ -500    ┆ 1500  ┆ 1500 ┆ 0   │
    │ 7   ┆ 100 ┆ 10  ┆ 111     ┆ 121   ┆ 0    ┆ 100 │
    │ 7   ┆ 100 ┆ 0   ┆ 111     ┆ 121   ┆ 100  ┆ 21  │
    │ 7   ┆ 100 ┆ 0   ┆ 111     ┆ 121   ┆ 200  ┆ 0   │
    │ 8   ┆ 123 ┆ 123 ┆ 123     ┆ 246   ┆ 0    ┆ 123 │
    │ 8   ┆ 123 ┆ 0   ┆ 123     ┆ 246   ┆ 123  ┆ 123 │
    │ 8   ┆ 123 ┆ 0   ┆ 123     ┆ 246   ┆ 246  ┆ 0   │
    │ 9   ┆ 100 ┆ 0   ┆ 300     ┆ 300   ┆ 0    ┆ 100 │
    │ 9   ┆ 150 ┆ 0   ┆ 300     ┆ 300   ┆ 100  ┆ 150 │
    │ 9   ┆ 200 ┆ 0   ┆ 300     ┆ 300   ┆ 250  ┆ 50  │
    │ 10  ┆ 10  ┆ 0   ┆ 75      ┆ 75    ┆ 0    ┆ 10  │
    │ 10  ┆ 10  ┆ 0   ┆ 75      ┆ 75    ┆ 10   ┆ 10  │
    └─────┴─────┴─────┴─────────┴───────┴──────┴─────┘