Introduction
I am using python pandas
to backtest own strategies on local stored market data. As I'd like to backtest those strategies fast and the data is large (7+ million rows) I am trying to vectorize all operations to do so. For entry signals evaluation that is already the case and it works pretty good. As exit criteria of each entry a take profit and stop loss price threshold is used. That is providing the following DataFrame
with datetime index
:
import pandas as pd
from pandas import Timestamp
import numpy as np
df = pd.DataFrame({
'open': {Timestamp('2021-01-03 22:11:00'): 1.22319, Timestamp('2021-01-03 22:12:00'): 1.22315, Timestamp('2021-01-03 22:15:00'): 1.22324, Timestamp('2021-01-03 22:16:00'): 1.22355, Timestamp('2021-01-03 22:17:00'): 1.22357},
'high': {Timestamp('2021-01-03 22:11:00'): 1.22319, Timestamp('2021-01-03 22:12:00'): 1.22318, Timestamp('2021-01-03 22:15:00'): 1.22358, Timestamp('2021-01-03 22:16:00'): 1.2236, Timestamp('2021-01-03 22:17:00'): 1.22361},
'low': {Timestamp('2021-01-03 22:11:00'): 1.22317, Timestamp('2021-01-03 22:12:00'): 1.22315, Timestamp('2021-01-03 22:15:00'): 1.22324, Timestamp('2021-01-03 22:16:00'): 1.22352, Timestamp('2021-01-03 22:17:00'): 1.22355},
'close': {Timestamp('2021-01-03 22:11:00'): 1.22317, Timestamp('2021-01-03 22:12:00'): 1.22315, Timestamp('2021-01-03 22:15:00'): 1.22358, Timestamp('2021-01-03 22:16:00'): 1.22352, Timestamp('2021-01-03 22:17:00'): 1.22356},
'longEntrySignal': {Timestamp('2021-01-03 22:11:00'): False, Timestamp('2021-01-03 22:12:00'): False, Timestamp('2021-01-03 22:15:00'): True, Timestamp('2021-01-03 22:16:00'): False, Timestamp('2021-01-03 22:17:00'): False},
'longEntry': {Timestamp('2021-01-03 22:11:00'): False, Timestamp('2021-01-03 22:12:00'): False, Timestamp('2021-01-03 22:15:00'): False, Timestamp('2021-01-03 22:16:00'): True, Timestamp('2021-01-03 22:17:00'): False},
'longEntryPrice': {Timestamp('2021-01-03 22:11:00'): np.nan, Timestamp('2021-01-03 22:12:00'): np.nan, Timestamp('2021-01-03 22:15:00'): np.nan, Timestamp('2021-01-03 22:16:00'): 1.22355, Timestamp('2021-01-03 22:17:00'): np.nan},
'longTpPrice': {Timestamp('2021-01-03 22:11:00'): np.nan, Timestamp('2021-01-03 22:12:00'): np.nan, Timestamp('2021-01-03 22:15:00'): np.nan, Timestamp('2021-01-03 22:16:00'): 1.2243451663854852, Timestamp('2021-01-03 22:17:00'): np.nan},
'longSlPrice': {Timestamp('2021-01-03 22:11:00'): np.nan, Timestamp('2021-01-03 22:12:00'): np.nan, Timestamp('2021-01-03 22:15:00'): np.nan, Timestamp('2021-01-03 22:16:00'): 1.2227548336145146, Timestamp('2021-01-03 22:17:00'): np.nan}})
print(df)
open high low close longEntrySignal longEntry longEntryPrice longTpPrice longSlPrice
2021-01-03 22:11:00 1.22319 1.22319 1.22317 1.22317 False False NaN NaN NaN
2021-01-03 22:12:00 1.22315 1.22318 1.22315 1.22315 False False NaN NaN NaN
2021-01-03 22:15:00 1.22324 1.22358 1.22324 1.22358 True False NaN NaN NaN
2021-01-03 22:16:00 1.22355 1.22360 1.22352 1.22352 False True 1.22355 1.224345 1.222755
2021-01-03 22:17:00 1.22357 1.22361 1.22355 1.22356 False False NaN NaN NaN
longEntrySignal
indicates throughTrue
andFalse
a given signal to open a long position within the next candle.longEntry
represents the opened position withTrue
andlongEntryPrice
stores the open of that candle as an entry price.longTpPrice
andlongSlPrice
are the corresponding prices where the opened position should be closed based on reached take profit or stop loss criteria.
Desired output
Depending on the chosen thresholds for take profit and stop loss it is likely that there will be multiple held positions with different entry points (time) and take profit respectively stop loss thresholds.
Anyway, my current issue is the calculation of how the position is closed after the entry. Meaning the minimum information to validate the strategy performance afterwards would be exitPrice
and exitTime
columns.
open high low close longEntrySignal longEntry longEntryPrice longTpPrice longSlPrice exitPrice exitTime
2021-01-03 22:11:00 1.22319 1.22319 1.22317 1.22317 False False NaN NaN NaN NaN NaN
2021-01-03 22:12:00 1.22315 1.22318 1.22315 1.22315 False False NaN NaN NaN NaN NaN
2021-01-03 22:15:00 1.22324 1.22358 1.22324 1.22358 True False NaN NaN NaN NaN NaN
2021-01-03 22:16:00 1.22355 1.22360 1.22352 1.22352 False True 1.22355 1.224345 1.222755 1.224345 2021-01-03 22:29:00
2021-01-03 22:17:00 1.22357 1.22361 1.22355 1.22356 False False NaN NaN NaN NaN NaN
exitPrice
would be the respective take profit (longTpPrice
) or stop loss (longSlPrice
) threshold, whereaslongSlPrice
should be considered if both threshold are reached within the samerow
(candle).exitTime
would then be the corresponding time.
Current approach
Currently I am doing the exit calculation by using the apply()
function once I reduced df
to rows with given long entries:
entryDf = df[df['longEntry']].copy()
entryDf[['exitPrice', 'exitTime']] = entryDf.apply(lambda x: getLongExit(exitDf=df[['high', 'low']], entryPrice=x['longEntryPrice'], entryTime=x.index, takeProfit=x['longTpPrice'], stopLoss=x['longSlPrice']), axis=1, result_type='expand')
getLongExit
is then determining by basically using.loc
,.idxmax()
and.idxmin()
wether, and if when, the take profit or/and stop loss thresholds are reached and comparing the findings to check which occurs earlier. It returns the respectiveexitPrice
with correspondingexitTime
.
Based on the determined exitPrice
information such as overall gain, win-rate etc. are calculated easily.
I hope I was able to give you a proper understanding of what my issue is. Pretty sure, that there is something much faster possible with using .ffill()
or similar, but I was not able to get it work. I am looking forward to your suggestions - thank you!
Edit/Update according to @Andrej's solution
I implemented your solution as suggested but it leads to following error:
Traceback (most recent call last):
File "/Users/maxwitt/PycharmProjects/ForexStrategies/strategy1.py", line 288, in <module>
get_long_exit(
File "/Users/maxwitt/PycharmProjects/ForexStrategies/venv/lib/python3.10/site-packages/numba/core/dispatcher.py", line 468, in _compile_for_args
error_rewrite(e, 'typing')
File "/Users/maxwitt/PycharmProjects/ForexStrategies/venv/lib/python3.10/site-packages/numba/core/dispatcher.py", line 409, in error_rewrite
raise e.with_traceback(None)
numba.core.errors.TypingError: Failed in nopython mode pipeline (step: nopython frontend)
No implementation of function Function(<built-in function setitem>) found for signature:
>>> setitem(array(float64, 1d, C), int64, datetime64[ns])
There are 16 candidate implementations:
- Of which 16 did not match due to:
Overload of function 'setitem': File: <numerous>: Line N/A.
With argument(s): '(array(float64, 1d, C), int64, datetime64[ns])':
No match.
During: typing of setitem at /Users/maxwitt/PycharmProjects/ForexStrategies/strategy1.py (167)
File "strategy1.py", line 167:
def get_long_exit(index, high_vals, low_vals, tp_prices, sl_prices, out_exit_price, out_indices):
<source elided>
out_exit_price[idx1] = sl_entry
out_indices[idx1] = index[idx2]
^
IIUC, for this task I'd use numba:
import numba
@numba.njit
def get_long_exit(
index, high_vals, low_vals, tp_prices, sl_prices, out_exit_price, out_indices
):
for idx1 in range(len(index) - 1):
if np.isnan(tp_prices[idx1]):
continue
tp_entry, sl_entry = tp_prices[idx1], sl_prices[idx1]
for idx2 in range(idx1 + 1, len(index)):
h, l = high_vals[idx2], low_vals[idx2]
# i'm not sure about these if's, but you can adjust it to your needs:
if sl_entry > l:
out_exit_price[idx1] = sl_entry
out_indices[idx1] = index[idx2]
break
elif tp_entry < h:
out_exit_price[idx1] = tp_entry
out_indices[idx1] = index[idx2]
break
df["exitPrice"] = np.nan
df["exitIndex"] = np.nan
# then fill the exitPrice/exitIndex values by:
get_long_exit(
df.index.values.astype("float64"), # <-- convert datetime64 to float64
df.high.values,
df.low.values,
df.longTpPrice.values,
df.longSlPrice.values,
df.exitPrice.values,
df.exitIndex.values,
)
# convert back float64 to datetime64
df["exitIndex"] = df["exitIndex"].astype("datetime64[ns]")
print(df)