pythonpandasdataframenumpysum

Sum rows from a column based on a condition and output them in a new column


In my dataframe I want to sum certain rows in a column and output them in a new column 'UE_more_days'

is

    ATEXT   BEGUZ_UE   UE_more_days
0             11.00            0.0
1     CT      23.00            Nan
2     RT      33.00           46.0
3             15.00            0.0
3             15.00            0.0
4             12.75            0.0
5             19.75            0.0
6             14.75            0.0
7     CT      23.00           29.5
8     CT      24.00           46.0
9     CT      24.00           48.0
10    RT      33.00           48.0
11            15.00            0.0
12
etc

should be

    ATEXT   BEGUZ_UE   UE_more_days
0
1     CT      23.00
2     RT      33.00           56.0
3             15.00
4             12.75
5             19.75
6             14.75
7     CT      23.00
8     CT      24.00
9     CT      24.00
10    RT      33.00          104.0
11            15.00
12
etc

should be 2

    ATEXT   BEGUZ_UE    subtract      add      UE_more_days  is_m_days
0             11.00     *0.00*        *3.92*
1     CT      *23.00*    0.00         0.00
2     RT      *33.00*    0.00         0.00          56.0
3             *15.00*    0.20         0.00                      *74.92*
4             12.75         
5             19.75
6             14.75     *2.00*       *0.00*
7     CT      *23.00*
8     CT      *24.00*
9     CT      *24.00*
10    TT      *33.00*                              104.0
11            *15.00*    0.00         3.57                     *117.00*
12
etc

my last try

bedd2 = [(df['ATEXT'] != ''),]
result2 = [(df.iloc[0:]['BEGUZ_UE'].astype(float).reset_index(drop=True) +
df.iloc[1:]['BEGUZ_UE'].astype(float)).round(decimals=2).shift(1)]
df['min_UE_mehr_Tage'] = np.select(bedd2, result2)

How can I sum rows from a column based on a condition and output them in a new column?


Solution

  • Replace with Nan the values in BEGUZ_UE where ATEXT is equal to "", and after that create groups based on the same condition.

    m = df['ATEXT'].eq("")
    cond = (~m) & m.shift(-1)
    df['UE_more_days'] = (df['BEGUZ_UE'].mask(m)
                          .groupby(m.cumsum()).cumsum()
                          .where(cond)
                         )
    tmv = (df[['subtract', 'add']]
           .shift()
           .groupby(m.cumsum())
           .transform('max')
           .eval('add-subtract')
          )
    
    
    df['m_days'] = (df.groupby(m[::-1].cumsum())['BEGUZ_UE']
                    .transform('sum')
                    .add(tmv)
                    .where(cond)
                    .shift()
                   )
    

    End result:

       ATEXT  BEGUZ_UE  subtract   add  UE_more_days  m_days
    0            11.00       0.0  3.92           NaN     NaN
    1     CT     23.00       0.0  0.00           NaN     NaN
    2     RT     33.00       0.0  0.00          56.0     NaN
    3            15.00       0.2  0.00           NaN   74.92
    4            12.75       NaN   NaN           NaN     NaN
    5            19.75       NaN   NaN           NaN     NaN
    6            14.75       2.0  0.00           NaN     NaN
    7     CT     23.00       NaN   NaN           NaN     NaN
    8     CT     24.00       NaN   NaN           NaN     NaN
    9     CT     24.00       NaN   NaN           NaN     NaN
    10    TT     33.00       NaN   NaN         104.0     NaN
    11           15.00       0.0  3.57           NaN  117.00
    

    Dataset I used:

    data = {'ATEXT': ['', 'CT', 'RT', '', '', '', '', 'CT', 'CT', 'CT', 'TT', ''], 
            'BEGUZ_UE': [11.0, 23.0, 33.0, 15.0, 12.75, 19.75, 14.75, 23.0, 
                         24.0, 24.0, 33.0, 15.0], 
            'subtract': [0.0, 0.0, 0.0, 0.2, np.nan, np.nan, 2.0, np.nan, 
                         np.nan, np.nan, np.nan, 0.0], 
            'add': [3.92, 0.0, 0.0, 0.0, np.nan, np.nan, 0.0, np.nan, np.nan, 
                    np.nan, np.nan, 3.57], 
            'UE_more_days': [np.nan, np.nan, 56.0, np.nan, np.nan, np.nan, np.nan, 
                             np.nan, np.nan, np.nan, 104.0, np.nan]}