pythonpandasloopsiterationdeviation

How to implement a loop over all columns within a calculation in pandas?


I'm new to pandas and python and I'm struggling with the implementation of loops in my code. I hope that someone can help me.

I have the following Dataframe:

import pandas as pd
from pandas import Timestamp

pd.DataFrame({'DateTime': {0: Timestamp('2021-06-13 00:00:00'),
  1: Timestamp('2021-06-13 02:00:00'),
  2: Timestamp('2021-06-13 05:00:00'),
  3: Timestamp('2021-06-13 07:00:00'),
  4: Timestamp('2021-06-13 10:00:00')},
 'actual_value': {0: 180.0949105082311,
  1: 183.93185469787613,
  2: 191.48399886639095,
  3: 188.31358023933768,
  4: 159.32768035801615},
 'forecast_0': {0: nan,
  1: 185.0,
  2: 206.0,
  3: 193.0,
  4: 130.0},
 'forecast_1': {0: 187.0,
  1: 185.0,
  2: 206.0,
  3: 192.0,
  4: 130.0},
 'forecast_2': {0: 186.0,
  1: nan,
  2: 200.0,
  3: 192.0,
  4: nan},
 'forecast_3': {0: 186.0,
  1: 185.0,
  2: 200.0,
  3: 192.0,
  4: 130.0},
 'forecast_4': {0: 186.0,
  1: 183.0,
  2: 200.0,
  3: 188.0,
  4: 130.0}})

             DateTime  actual_value  forecast_0  forecast_1  forecast_2  \
0 2021-06-13 00:00:00    180.094911         NaN       187.0       186.0   
1 2021-06-13 02:00:00    183.931855       185.0       185.0         NaN   
2 2021-06-13 05:00:00    191.483999       206.0       206.0       200.0   
3 2021-06-13 07:00:00    188.313580       193.0       192.0       192.0   
4 2021-06-13 10:00:00    159.327680       130.0       130.0         NaN   

   forecast_3  forecast_4  
0       186.0       186.0  
1       185.0       183.0  
2       200.0       200.0  
3       192.0       188.0  
4       130.0       130.0  

I want to create a new Dataframe or replace the numbers in the existing one with a simple calculation. I want to determine the deviation of all forecast values relative to the actual value in the second column. Since there are over 40 such forecast columns it is simply too time consuming to write down the calculation for every column. That's why I would like to implement a loop. I tried the following code, which didn't work:

for i, col in enumerate(df.columns, -2):
    df[col] = (df[col]-df['actual_value'])/df['actual_value']

I get the error, that 'subtract' cannot use operands with types dtype('<M8[ns]') and dtype('float64'). Does anyone has an idea how to solve this issue? I'm thankful for every message.


Solution

  • The error 'subtract' cannot use operands with types dtype('<M8[ns]') and dtype('float64') is because your loop is trying to subtract the first column which is datetime and the float in actual_value column.

    To do this correctly, you could change your loop to for col in df.columns[2:]:

    Even though I agree with other solutions posted here - it is more elegant to do it without using loop.