python pandas vectorization trading back-testing

Vectorize stop loss / take profit backtesting of multiple positions with python pandas

Introduction

I am using python pandas to backtest own strategies on local stored market data. As I'd like to backtest those strategies fast and the data is large (7+ million rows) I am trying to vectorize all operations to do so. For entry signals evaluation that is already the case and it works pretty good. As exit criteria of each entry a take profit and stop loss price threshold is used. That is providing the following DataFrame with datetime index:

import pandas as pd
from pandas import Timestamp
import numpy as np

df = pd.DataFrame({
    'open': {Timestamp('2021-01-03 22:11:00'): 1.22319, Timestamp('2021-01-03 22:12:00'): 1.22315, Timestamp('2021-01-03 22:15:00'): 1.22324, Timestamp('2021-01-03 22:16:00'): 1.22355, Timestamp('2021-01-03 22:17:00'): 1.22357}, 
    'high': {Timestamp('2021-01-03 22:11:00'): 1.22319, Timestamp('2021-01-03 22:12:00'): 1.22318, Timestamp('2021-01-03 22:15:00'): 1.22358, Timestamp('2021-01-03 22:16:00'): 1.2236, Timestamp('2021-01-03 22:17:00'): 1.22361}, 
    'low': {Timestamp('2021-01-03 22:11:00'): 1.22317, Timestamp('2021-01-03 22:12:00'): 1.22315, Timestamp('2021-01-03 22:15:00'): 1.22324, Timestamp('2021-01-03 22:16:00'): 1.22352, Timestamp('2021-01-03 22:17:00'): 1.22355}, 
    'close': {Timestamp('2021-01-03 22:11:00'): 1.22317, Timestamp('2021-01-03 22:12:00'): 1.22315, Timestamp('2021-01-03 22:15:00'): 1.22358, Timestamp('2021-01-03 22:16:00'): 1.22352, Timestamp('2021-01-03 22:17:00'): 1.22356}, 
    'longEntrySignal': {Timestamp('2021-01-03 22:11:00'): False, Timestamp('2021-01-03 22:12:00'): False, Timestamp('2021-01-03 22:15:00'): True, Timestamp('2021-01-03 22:16:00'): False, Timestamp('2021-01-03 22:17:00'): False}, 
    'longEntry': {Timestamp('2021-01-03 22:11:00'): False, Timestamp('2021-01-03 22:12:00'): False, Timestamp('2021-01-03 22:15:00'): False, Timestamp('2021-01-03 22:16:00'): True, Timestamp('2021-01-03 22:17:00'): False}, 
    'longEntryPrice': {Timestamp('2021-01-03 22:11:00'): np.nan, Timestamp('2021-01-03 22:12:00'): np.nan, Timestamp('2021-01-03 22:15:00'): np.nan, Timestamp('2021-01-03 22:16:00'): 1.22355, Timestamp('2021-01-03 22:17:00'): np.nan}, 
    'longTpPrice': {Timestamp('2021-01-03 22:11:00'): np.nan, Timestamp('2021-01-03 22:12:00'): np.nan, Timestamp('2021-01-03 22:15:00'): np.nan, Timestamp('2021-01-03 22:16:00'): 1.2243451663854852, Timestamp('2021-01-03 22:17:00'): np.nan}, 
    'longSlPrice': {Timestamp('2021-01-03 22:11:00'): np.nan, Timestamp('2021-01-03 22:12:00'): np.nan, Timestamp('2021-01-03 22:15:00'): np.nan, Timestamp('2021-01-03 22:16:00'): 1.2227548336145146, Timestamp('2021-01-03 22:17:00'): np.nan}})

print(df)

                    open    high    low     close   longEntrySignal longEntry   longEntryPrice  longTpPrice longSlPrice
2021-01-03 22:11:00 1.22319 1.22319 1.22317 1.22317 False           False       NaN             NaN         NaN
2021-01-03 22:12:00 1.22315 1.22318 1.22315 1.22315 False           False       NaN             NaN         NaN
2021-01-03 22:15:00 1.22324 1.22358 1.22324 1.22358 True            False       NaN             NaN         NaN
2021-01-03 22:16:00 1.22355 1.22360 1.22352 1.22352 False           True        1.22355         1.224345    1.222755
2021-01-03 22:17:00 1.22357 1.22361 1.22355 1.22356 False           False       NaN             NaN         NaN

longEntrySignal indicates through True and False a given signal to open a long position within the next candle. longEntry represents the opened position with True and longEntryPrice stores the open of that candle as an entry price. longTpPrice and longSlPrice are the corresponding prices where the opened position should be closed based on reached take profit or stop loss criteria.

Desired output

Depending on the chosen thresholds for take profit and stop loss it is likely that there will be multiple held positions with different entry points (time) and take profit respectively stop loss thresholds.

Anyway, my current issue is the calculation of how the position is closed after the entry. Meaning the minimum information to validate the strategy performance afterwards would be exitPrice and exitTime columns.

                    open    high    low     close   longEntrySignal longEntry   longEntryPrice  longTpPrice longSlPrice exitPrice exitTime
2021-01-03 22:11:00 1.22319 1.22319 1.22317 1.22317 False           False       NaN             NaN         NaN         NaN       NaN
2021-01-03 22:12:00 1.22315 1.22318 1.22315 1.22315 False           False       NaN             NaN         NaN         NaN       NaN
2021-01-03 22:15:00 1.22324 1.22358 1.22324 1.22358 True            False       NaN             NaN         NaN         NaN       NaN
2021-01-03 22:16:00 1.22355 1.22360 1.22352 1.22352 False           True        1.22355         1.224345    1.222755    1.224345  2021-01-03 22:29:00
2021-01-03 22:17:00 1.22357 1.22361 1.22355 1.22356 False           False       NaN             NaN         NaN         NaN       NaN

exitPrice would be the respective take profit (longTpPrice) or stop loss (longSlPrice) threshold, whereas longSlPrice should be considered if both threshold are reached within the same row (candle). exitTime would then be the corresponding time.

Current approach

Currently I am doing the exit calculation by using the apply()function once I reduced df to rows with given long entries:

entryDf = df[df['longEntry']].copy()
entryDf[['exitPrice', 'exitTime']] = entryDf.apply(lambda x: getLongExit(exitDf=df[['high', 'low']], entryPrice=x['longEntryPrice'], entryTime=x.index, takeProfit=x['longTpPrice'], stopLoss=x['longSlPrice']), axis=1, result_type='expand')

getLongExit is then determining by basically using .loc, .idxmax() and .idxmin() wether, and if when, the take profit or/and stop loss thresholds are reached and comparing the findings to check which occurs earlier. It returns the respective exitPrice with corresponding exitTime.

Based on the determined exitPrice information such as overall gain, win-rate etc. are calculated easily.

I hope I was able to give you a proper understanding of what my issue is. Pretty sure, that there is something much faster possible with using .ffill() or similar, but I was not able to get it work. I am looking forward to your suggestions - thank you!

Edit/Update according to @Andrej's solution

I implemented your solution as suggested but it leads to following error:

Traceback (most recent call last):
  File "/Users/maxwitt/PycharmProjects/ForexStrategies/strategy1.py", line 288, in <module>
    get_long_exit(
  File "/Users/maxwitt/PycharmProjects/ForexStrategies/venv/lib/python3.10/site-packages/numba/core/dispatcher.py", line 468, in _compile_for_args
    error_rewrite(e, 'typing')
  File "/Users/maxwitt/PycharmProjects/ForexStrategies/venv/lib/python3.10/site-packages/numba/core/dispatcher.py", line 409, in error_rewrite
    raise e.with_traceback(None)
numba.core.errors.TypingError: Failed in nopython mode pipeline (step: nopython frontend)
No implementation of function Function(<built-in function setitem>) found for signature:
 
 >>> setitem(array(float64, 1d, C), int64, datetime64[ns])
 
There are 16 candidate implementations:
   - Of which 16 did not match due to:
   Overload of function 'setitem': File: <numerous>: Line N/A.
     With argument(s): '(array(float64, 1d, C), int64, datetime64[ns])':
    No match.

During: typing of setitem at /Users/maxwitt/PycharmProjects/ForexStrategies/strategy1.py (167)

File "strategy1.py", line 167:
def get_long_exit(index, high_vals, low_vals, tp_prices, sl_prices, out_exit_price, out_indices):
    <source elided>
                out_exit_price[idx1] = sl_entry
                out_indices[idx1] = index[idx2]
                ^

Solution

IIUC, for this task I'd use numba:

import numba

@numba.njit
def get_long_exit(
    index, high_vals, low_vals, tp_prices, sl_prices, out_exit_price, out_indices
):
    for idx1 in range(len(index) - 1):
        if np.isnan(tp_prices[idx1]):
            continue

        tp_entry, sl_entry = tp_prices[idx1], sl_prices[idx1]

        for idx2 in range(idx1 + 1, len(index)):
            h, l = high_vals[idx2], low_vals[idx2]

            # i'm not sure about these if's, but you can adjust it to your needs:
            if sl_entry > l:
                out_exit_price[idx1] = sl_entry
                out_indices[idx1] = index[idx2]
                break
            elif tp_entry < h:
                out_exit_price[idx1] = tp_entry
                out_indices[idx1] = index[idx2]
                break

df["exitPrice"] = np.nan
df["exitIndex"] = np.nan

# then fill the exitPrice/exitIndex values by:
get_long_exit(
    df.index.values.astype("float64"),  # <-- convert datetime64 to float64
    df.high.values,
    df.low.values,
    df.longTpPrice.values,
    df.longSlPrice.values,
    df.exitPrice.values,
    df.exitIndex.values,
)

# convert back float64 to datetime64
df["exitIndex"] = df["exitIndex"].astype("datetime64[ns]")

print(df)