Assume I have an very large array of consecutive values representing a stock price over time.
prices = [22,23,25,23,26,25,28,22] # and so on…
When „buying“ the stock at any point in time (at any index in that array), I set a Stop Loss and a Take Profit.
buy_index = 2 # buying at 25
stop_loss = 23 # would sell at 23 or below
take_profit = 28 # would sell at 28 or higher
That just means I set two prices at which I would sell: one above the buy price and one below.
My question is: how can I efficiently figure out which of both prices I hit first?
I tried using numpy with the following steps:
import numpy as np
prices = np.array(prices)
relevant_prices = prices[buy_index:]
stop_loss_index = np.where[relevant_prices < stop_loss][0]
take_profit_index = np.where[relevant_prices < take_profit][0]
…and then comparing the indexes to determine which case came first. This works, but is extremely slow when done millions of times.
I realize that my code is going through the whole dataset every time it determines an index - there has to be a better way to do this.
Use np.argmax
Since for boolean conditions argmax does not loop over the entire array i.e. short circuits when it finds a True.
Code
Using argmax:
relevant_prices = prices[buy_index:]
stop_loss_index = np.argmax(relevant_prices <= stop_loss)
take_profit_index = np.argmax(relevant_prices >= take_profit)
…and then comparing the indexes to determine which case came first
Demonstration of argmax short-circuiting on boolean arrays
import timeit
print('Million Point Array of False')
arr = np.full(1000000, False, dtype=bool)
%timeit arr.argmax()
print('\nTrue at beginning of Array of False')
arr[0] = True
%timeit arr.argmax()
Output
Note 52 X speed up in the 2nd case where True is at the beginning
Million Point Array of False
56.9 µs ± 1.57 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
True at beginning of Array of False
1.08 µs ± 175 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)