I have got OHLC data with missing time frames. Suppose I have the following pandas dataframe denoted by the variable df:
Open High Low Close
2019-04-19 00:00:00 0.67068 0.67123 0.67064 0.67123
2019-04-19 00:02:00 0.67062 0.67425 0.67060 0.67223
Now, I resample that pandas dataframe to fill the missing gap and I get the following:
df = df.resample('T').ffill()
Open High Low Close
2019-04-19 00:00:00 0.67068 0.67123 0.67064 0.67123
2019-04-19 00:01:00 0.67068 0.67123 0.67064 0.67123
2019-04-19 00:02:00 0.67062 0.67425 0.67060 0.67223
From the above, we can see that the missing gap (00:01:00) is filled with the help of ffill(). However, the data in that row (row starting with 00:01:00) is not displayed properly as the opening price should be the same as the closing price of the previous row (row starting with 00:00:00). Likewise, the closing price of that row (row starting with 00:01:00) should be the same as the opening price of the next row (row starting with 00:02:00). The desired output should look like this:
Open High Low Close
2019-04-19 00:00:00 0.67068 0.67123 0.67064 0.67123
2019-04-19 00:01:00 0.67123 0.67123 0.67064 0.67062
2019-04-19 00:02:00 0.67062 0.67425 0.67060 0.67223
How would I resolve this problem in pandas?
Unfortunately, you can't directly specify a fill method per column.
A workaround would be not to fill the values during the resampling but to do it afterwards:
df = df.resample('T').fillna(None)
df['Open'], df['Close'] = (df['Open'].fillna(df['Close'].ffill()),
df['Close'].fillna(df['Open'].bfill()))
df = df.ffill()
output:
Open High Low Close
2019-04-19 00:00:00 0.67068 0.67123 0.67064 0.67123
2019-04-19 00:01:00 0.67123 0.67123 0.67064 0.67062
2019-04-19 00:02:00 0.67062 0.67425 0.67060 0.67223
(df.resample('T')
.fillna(None)
.assign(Close=lambda d: d['Close'].bfill()) # bfill for Close
.ffill() # ffill for others
)
output:
Open High Low Close
2019-04-19 00:00:00 0.67068 0.67123 0.67064 0.67123
2019-04-19 00:01:00 0.67068 0.67123 0.67064 0.67223
2019-04-19 00:02:00 0.67062 0.67425 0.67060 0.67223
(df.resample('T')
.fillna(None)
.assign(Open=lambda d: d['Open'].fillna(d['Close'].ffill())) # Open = last Close
.ffill() # ffill the others
)
output:
Open High Low Close
2019-04-19 00:00:00 0.67068 0.67123 0.67064 0.67123
2019-04-19 00:01:00 0.67123 0.67123 0.67064 0.67123
2019-04-19 00:02:00 0.67062 0.67425 0.67060 0.67223
Here is another example where we'll interpolate High and leave Low as NaNs:
(df.resample('T')
.fillna(None)
.assign(Open=lambda d: d['Open'].ffill(),
Close=lambda d: d['Close'].bfill(),
High=lambda d: d['High'].interpolate()
)
)
output:
Open High Low Close
2019-04-19 00:00:00 0.67068 0.67123 0.67064 0.67123
2019-04-19 00:01:00 0.67068 0.67274 NaN 0.67223
2019-04-19 00:02:00 0.67062 0.67425 0.67060 0.67223