pandasdataframevariablessum

Summing rows in a Pandas DataFrame where the number of rows summed is based on columns values in a different DataFrame


I have two Dataframes. Dataframe1(df1): Column B is initially empty and needs to be populated as below.

| A( in days) | B (sum) |
| ----------- | ------- |
| 7           |         |
| 14          |         |
| 18          |         |
| 25          |         |

df1 has two columns: Column A in days and another column B that is blank and needs to be filled with sum of rows from dataframe 2,the number of rows to be summed is based on how days translates to week in column B.

Dateframe 2 (df2):

| week | C(weekly) |
| ---- | --------- |
| WK1  | 10        |
| WK2  | 5         |
| WK3  | 7         |
| WK4  | 4         |
| WK5  | 8         |

Dataframe 2 has two columns: This dataframe has Week numbers and weekly which is sum of weekly units.

I want to fill the values in column A of dataframe 1 based on column d in Dataframe 2 like below:

For first row when A=7 days then just use WK1 values (As 7 days is 1 week so just WK1 of C from DF2) so B=10

For second row when A=14 days (2 weeks), I need WK1 and WK2 sum for C from DF2) so B=10+5=15 For third row, when A=18,I need the sum of WK1 and WK2 and 5/7(WK3) of C from DF2) so B=10+5+(4/7)*7=19 For fourth row, when A=25, I need the sum of WK1+Wk2+wk3+(4/7)*4,so B=24.28 (3 weeks plus fraction -21day + 4/7th of week4 value)

df1 (Completed):

| A( in days) | B(sum) | Methodology     |
| ----------- | ------- | --------------- |
| 7           | 10      | 10              |
| 14          | 15      | 10+5            |
| 18          | 19      | 10+5+(4/7)*7    |
| 25          | 24.28   | 10+5+7+(4/7)*4  |

I am new to python and do now know how to proceed. Please help


Solution

  • the number of weeks is equal to the number of rows (with potentially a few days missing)

    IIUC, you can use a mod operation to correct the cumsum:

    r = df1['A'].mod(7).to_numpy()
    c = df2['C'].head(len(df1))
    df1['B'] = c.cumsum().sub((7-r)/7*c.where(r>0), fill_value=0).to_numpy()
    

    Output:

        A          B
    0   7  10.000000
    1  14  15.000000
    2  18  19.000000
    3  25  24.285714
    

    the number of weeks is arbitrary

    Then we need to use a custom function

    def cust_sum(n, ref):
        d, r = divmod(n, 7)
        return ref.head(d).sum() + (ref.iat[d]*r/7 if r else 0)
    
    df1['B'] = df1['A'].apply(cust_sum, ref=df2['C'])
    

    Example:

        A          B
    0   7  10.000000
    1  14  15.000000
    2  18  19.000000
    3  29  27.142857