pythonpandasnumpy

How to add up value in column v, by multiplying previous cell with a fixed factor, then add current cell of another column


Starting with a "pd.DataFrame" df :~

         n    v
    0    1  0.0
    1    2  0.0
    2    3  0.0
    3    4  0.0
    4    5  0.0
    5    6  0.0

I'd like to add up value in column "v", whereby a cell in column "v" is produced by multiplying previous cell of "v" with a fixed factor, then add current cell value of column "n". (See sample calculation table below)

    ## sample calculation table :~
    n   v[i]      n  + v[i-1] * fixed factor 
    1   1.0     = 1  + 0.0    * 0.5
    2   2.5     = 2  + 1.0    * 0.5   
    3   4.25    = 3  + 2.5    * 0.5
    4   6.125   = 4  + 4.25   * 0.5
    5   8.0625  = 5  + 6.125  * 0.5
    6  10.03125 = 6  + 8.063  * 0.5

Managed to do it with row-by-row iteration (see for-loop below).

However I think vectorised methods (like cumsum and shift) may be more efficient, but could not figure out how; because cumsum is complicated by multiplication, starting with an empty column v, and need to reference to previous cell of a column.

Wonder how to do this with vectorised methods ?

To reproduce my code :

    df = pd.DataFrame({'n':[1,2,3,4,5,6]})
    df['v'] = 0.0

    def fnv(nu, vu):
        return nu + vu * 0.5

    for i in range(0, df.shape[0]):
        df.v.at[i] = fnv(df.n.at[i], df.v.at[i-1] if i>0 else 0.0)
    df (RESULTS) :~
         n      v
    0    1   1.00
    1    2   2.50
    2    3   4.25
    3    4   6.12
    4    5   8.06
    5    6  10.03

Solution

  • Since n is variable, you can't easily vectorize this (you could using a matrix operation, see below, but this would take O(n^2) space).

    A good tradeoff might be to use numba to speed the operation:

    from numba import jit
    
    @jit(nopython=True)
    def fnv(n, factor=0.5):
        out = []
        prev = 0
        for x in n:
            out.append(x + prev*factor)
            prev = out[-1]
        return out
    
    df['v'] = fnv(df['n'].to_numpy())
    

    Output:

       n         v
    0  1   1.00000
    1  2   2.50000
    2  3   4.25000
    3  4   6.12500
    4  5   8.06250
    5  6  10.03125
    

    vectorized approach

    You could vectorize using a square matrix:

    x = np.arange(len(df))
    tmp = x[:, None]-x
    df['v'] = np.nansum(np.where(tmp>=0, 0.5**tmp, np.nan) * df['n'].to_numpy(),
                        axis=1)
    

    Intermediates:

    # tmp
    [[ 0 -1 -2 -3 -4 -5]
     [ 1  0 -1 -2 -3 -4]
     [ 2  1  0 -1 -2 -3]
     [ 3  2  1  0 -1 -2]
     [ 4  3  2  1  0 -1]
     [ 5  4  3  2  1  0]]
    
    # tmp >= 0
    [[ True False False False False False]
     [ True  True False False False False]
     [ True  True  True False False False]
     [ True  True  True  True False False]
     [ True  True  True  True  True False]
     [ True  True  True  True  True  True]]
    
    # np.where(tmp>=0, 0.5**tmp, np.nan) * df['n'].to_numpy()
    [[1.          nan     nan     nan     nan     nan]
     [0.5     2.          nan     nan     nan     nan]
     [0.25    1.      3.          nan     nan     nan]
     [0.125   0.5     1.5     4.          nan     nan]
     [0.0625  0.25    0.75    2.      5.          nan]
     [0.03125 0.125   0.375   1.      2.5     6.     ]]