Starting with a "pd.DataFrame" df :~
n v
0 1 0.0
1 2 0.0
2 3 0.0
3 4 0.0
4 5 0.0
5 6 0.0
I'd like to add up value in column "v", whereby a cell in column "v" is produced by multiplying previous cell of "v" with a fixed factor, then add current cell value of column "n". (See sample calculation table below)
## sample calculation table :~
n v[i] n + v[i-1] * fixed factor
1 1.0 = 1 + 0.0 * 0.5
2 2.5 = 2 + 1.0 * 0.5
3 4.25 = 3 + 2.5 * 0.5
4 6.125 = 4 + 4.25 * 0.5
5 8.0625 = 5 + 6.125 * 0.5
6 10.03125 = 6 + 8.063 * 0.5
Managed to do it with row-by-row iteration (see for-loop below).
However I think vectorised methods (like cumsum and shift) may be more efficient, but could not figure out how; because cumsum is complicated by multiplication, starting with an empty column v, and need to reference to previous cell of a column.
Wonder how to do this with vectorised methods ?
df = pd.DataFrame({'n':[1,2,3,4,5,6]})
df['v'] = 0.0
def fnv(nu, vu):
return nu + vu * 0.5
for i in range(0, df.shape[0]):
df.v.at[i] = fnv(df.n.at[i], df.v.at[i-1] if i>0 else 0.0)
df (RESULTS) :~
n v
0 1 1.00
1 2 2.50
2 3 4.25
3 4 6.12
4 5 8.06
5 6 10.03
Since n
is variable, you can't easily vectorize this (you could using a matrix operation, see below, but this would take O(n^2) space).
A good tradeoff might be to use numba to speed the operation:
from numba import jit
@jit(nopython=True)
def fnv(n, factor=0.5):
out = []
prev = 0
for x in n:
out.append(x + prev*factor)
prev = out[-1]
return out
df['v'] = fnv(df['n'].to_numpy())
Output:
n v
0 1 1.00000
1 2 2.50000
2 3 4.25000
3 4 6.12500
4 5 8.06250
5 6 10.03125
You could vectorize using a square matrix:
x = np.arange(len(df))
tmp = x[:, None]-x
df['v'] = np.nansum(np.where(tmp>=0, 0.5**tmp, np.nan) * df['n'].to_numpy(),
axis=1)
Intermediates:
# tmp
[[ 0 -1 -2 -3 -4 -5]
[ 1 0 -1 -2 -3 -4]
[ 2 1 0 -1 -2 -3]
[ 3 2 1 0 -1 -2]
[ 4 3 2 1 0 -1]
[ 5 4 3 2 1 0]]
# tmp >= 0
[[ True False False False False False]
[ True True False False False False]
[ True True True False False False]
[ True True True True False False]
[ True True True True True False]
[ True True True True True True]]
# np.where(tmp>=0, 0.5**tmp, np.nan) * df['n'].to_numpy()
[[1. nan nan nan nan nan]
[0.5 2. nan nan nan nan]
[0.25 1. 3. nan nan nan]
[0.125 0.5 1.5 4. nan nan]
[0.0625 0.25 0.75 2. 5. nan]
[0.03125 0.125 0.375 1. 2.5 6. ]]