I have a dataframe of position_signal
and close
(closing prices) for a given asset. You can think of each row being a specific time step (minute, hour, etc.). I want to run a vectorized backtest to calculate my aggregated portfolio return at each point in time whether i'm long, short or not holding. In which case the position_signal
column is either 1
, -1
, 0
, respectively.
I've tried a lot... seems like an easy problem. There are a lot of tutorials online that claim to do it. However the problem with all of these is that they either: assume cumulative returns during trading (so each day's returns are multiplied by the next, which isn't right if you're just holding a position) or they don't calculate short returns correctly.
The code for most of them boils down to this:
df['log_returns'] = np.log(df['close']) - np.log(df['close'].shift(1))
df['strategy_returns'] = df['position_signal'] * df['log_returns']
df['cumulative_ret'] = df['strategy_returns'].cumsum().apply(np.exp)
Now, the math is fairly annoying so I don't want to do it here... but essentially this works fine with longs, but breaks down when trying to calculate the returns for shorts. The reason is because it thinks that our returns on our shorts is just the negative of our loss on an equivalent long position. Which is wrong.
Let's say I have a dataframe that represents two trades taken: one that is long
from price = $1
--> $2
and the other is short from $2
-> $1
. Now, if we were to calculate the returns for this would simply be: 100% on the first trade, and 50% on the second.
If we held a position of $100 at the start of the session, our first trade (+100%) would turn our $100 to $200, then our second trade (+50%) would turn our $200 to $300. So our final position is $300 (or 300%).
The problem is that the code given in these tutorials does the long right but then thinks our returns for the short is 100% (because our returns are the negative of a -100% returns on a long).
Here is a code snippet that demonstrates this. A minor thing to note, we enter at the price above the first row where position_signal = 1/-1
, this basically means our position_signal
is shifted by 1 to account for look-ahead bias. It's a semantics issue and not really an issue, you can do it however you'd like as long as returns are the same
df = pd.DataFrame([[0.0, 1.0],
[0.0, 1.2],
[0.0, 1.50],
[0.0, 1],
[1.0, 1.2], # We entered a LONG at the open of this timestep, which is the same as the close of the previous ($1.0)
[1.0, 1.3],
[1.0, 2.0], # We exit at close of this timestep, so $2
[0.0, 1.7],
[0.0, 2],
[-1.0, 1.798], # We entered a SHORT at the open of this timestep, which is the same as the close of the previous ($2.0)
[-1.0, 0.50],
[-1.0, 1.3],
[-1.0, 1], # We exit at close of this timestep, so $1.0
[0.0, 1.5]],
columns=['position_signal','close'])
df['log_returns'] = np.log(df.close) - np.log(df.close.shift(1))
df['strategy_returns'] = df.log_returns * df.position_signal
df['cumulative_returns'] = df.strategy_returns.cumsum().apply(np.exp)
print(df)
position_signal close log_returns strategy_returns cumulative_returns
0 0.0 1.000 NaN NaN NaN
1 0.0 1.200 0.182322 0.000000 1.000000
2 0.0 1.500 0.223144 0.000000 1.000000
3 0.0 1.000 -0.405465 -0.000000 1.000000
4 1.0 1.200 0.182322 0.182322 1.200000
5 1.0 1.300 0.080043 0.080043 1.300000
6 1.0 2.000 0.430783 0.430783 2.000000
7 0.0 1.700 -0.162519 -0.000000 2.000000
8 0.0 2.000 0.162519 0.000000 2.000000
9 -1.0 1.798 -0.106472 0.106472 2.224694
10 -1.0 0.500 -1.279822 1.279822 8.000000
11 -1.0 1.300 0.955511 -0.955511 3.076923
12 -1.0 1.000 -0.262364 0.262364 4.000000
13 0.0 1.500 0.405465 0.000000 4.000000
As you can see, by the end it assumes our returns are 4.0 (400%), which is not what we want. It should be 3.0 (300%)
1 - There's a way to do this if we ignore the intermediate price ticks. If we only reduce this df to four rows: enter trade 1, exit trade 1, enter trade 2, exit trade 2, then we can just mask on position signal, and calculate returns on each trade as a single row and then run cumprod on that. I don't want that, I need to know my portfolio value in the intermediate. Which is why I added those rows for the example above.
2 - There's also a way to do this by dividing the pnl due to shorts and pnl due to longs. But this isn't good because if I'm down 50% due to a bad long, it affects my purchasing power for the next trade (i.e. being up 50% for the next trade doesn't make me break-even)
3 - There's also the non-vectorized route, but that is too slow for me.
The log returns do produce an overall return of 3.0, your formula for conversion from log returns to arithmetic returns is incorrect as you are not subtracting one after the exponentialization. To recap the core formulae are:
Let P1 = Initial Price, P2 = Exit Price
Arithmetic Return
R = (P2-P1)/P1 = P2/P1 - 1
Log Return
r = ln(P2/P1) = ln(p2) - ln(p1)
So to convert from log return r to arithmetic R the formula is:
R = exp(r) - 1
For your example:
import numpy as np
np.exp(np.log(2/1) - np.log(1/2))-1
As a matter of fact, this as well as the opposite conversion:
r = ln(1+R)
are so common that numpy has specific functions for both:
np.log1p
np.expm1