pythonpandasloops

Referencing prior rows and multiple columns in a calculation


I have a dataframe that looks like the below. You can see that, to start, the SMA column is a duplicate of the Index Value column intentionally. I'm trying to adjust the SMA column so that it will be the result of a simple calculation, something to the effect of the below, which I believe would require a loop of some kind:

for i in range(len(df)):
    df['SMA'].loc[i,"SMA"]= df['SMA'].loc[i-1,"SMA"] + df['SMA'].loc[i,"ER"] * (df['SMA'].loc[i,"Index Value"] - df['SMA'].loc[i-1,"SMA"])

In words, I'm attempting to take the current row of 'SMA' and have it be the output of the prior row of "SMA" + the current row of "ER" multiplied by the difference between the current row of "Index Value" and prior row of "SMA"

data = {
    'Date': ['9/5/2023', '9/6/2023', '9/7/2023', '9/8/2023', '9/11/2023', '9/12/2023', 
             '9/13/2023', '9/14/2023', '9/15/2023', '9/18/2023', '9/19/2023', '9/20/2023', 
             '9/21/2023', '9/22/2023', '9/25/2023', '9/26/2023', '9/27/2023', '9/28/2023', 
             '9/29/2023'],
    'Index Value': [4496.83, 4465.48, 4451.14, 4457.49, 4487.46, 4461.9, 4467.44, 
                    4505.1, 4450.32, 4453.53, 4443.95, 4402.2, 4330, 4320.06, 
                    4337.44, 4273.53, 4274.51, 4299.7, 4288.05],
    'ER': [0.15066, 0.157105, 0.218561, 0.233893, 0.233709, 0.191352, 0.090935, 
           0.398004, 0.127833, 0.062296, 0.198933, 0.382776, 0.494406, 0.541878, 
           0.44323, 0.489456, 0.466276, 0.396806, 0.52569],
    'SMA': [4496.83, 4465.48, 4451.14, 4457.49, 4487.46, 4461.9, 4467.44, 4505.1, 
            4450.32, 4453.53, 4443.95, 4402.2, 4330, 4320.06, 4337.44, 4273.53, 
            4274.51, 4299.7, 4288.05]
}

df = pd.DataFrame(data)

In an effort to avoid a loop since I'm not great with loop construction, I've tried something like the below, which is close, but not exactly what I want.

df['SMA']=df['SMA'].shift(1) + df['ER'] * (df['Index Value'] - df['SMA'].shift(1))

The reason why it's not exactly what I want is because I want the first number of the Index Value column to match the first value of SMA, and have the formula apply only after that point (which is why it seems a loop would be necessary).

All of that being said and given the goal, can someone please advise what the best way might be to accomplish this task?


Solution

  • First, you can simplify your existing code in two ways:

    df["Index Value"].shift() + df["ER"] * df["Index Value"].diff()
    

    Now to answer your question, IIUC you want to keep the first value of column 'Index Value' (i.e., 4496.83) and apply the formula only from the second row onward.

    There are a few options to do that:

    # 1. use `at` and exclude the first value when creating the new column
    df.at[1:, "SMA"] = df["Index Value"].shift() + df["ER"] * df["Index Value"].diff()
    
    # 2. create the column and use `at` and `iloc` to replace the first value
    df["SMA"] = df["Index Value"].shift() + df["ER"] * df["Index Value"].diff()
    df.at[0, "SMA"] = df.iloc[0]["Index Value"]
    
    # 3. create the column and use `fillna` to replace the first value
    df["SMA"] = df["Index Value"].shift() + df["ER"] * df["Index Value"].diff()
    df["SMA"] = df["SMA"].fillna(df["Index Value"])
    

    All will result in:

             Date  Index Value        ER          SMA
    0    9/5/2023      4496.83  0.150660  4496.830000
    1    9/6/2023      4465.48  0.157105  4491.904758
    2    9/7/2023      4451.14  0.218561  4462.345835
    3    9/8/2023      4457.49  0.233893  4452.625221
    4   9/11/2023      4487.46  0.233709  4464.494259
    5   9/12/2023      4461.90  0.191352  4482.569043
    6   9/13/2023      4467.44  0.090935  4462.403780
    7   9/14/2023      4505.10  0.398004  4482.428831
    8   9/15/2023      4450.32  0.127833  4498.097308
    9   9/18/2023      4453.53  0.062296  4450.519970
    10  9/19/2023      4443.95  0.198933  4451.624222
    11  9/20/2023      4402.20  0.382776  4427.969102
    12  9/21/2023      4330.00  0.494406  4366.503887
    13  9/22/2023      4320.06  0.541878  4324.613733
    14  9/25/2023      4337.44  0.443230  4327.763337
    15  9/26/2023      4273.53  0.489456  4306.158867
    16  9/27/2023      4274.51  0.466276  4273.986950
    17  9/28/2023      4299.70  0.396806  4284.505543
    18  9/29/2023      4288.05  0.525690  4293.575711