I am learning Python, so need direction on how to approach this problem.
I am using the Quandl package to download historical futures data (ESH2000, ESM2000, ESU2000, ESZ2000, ESH2001, ...., ESU2014)
. Now I want to build the continuous back adjusted contract for plotting and backtesting.
Would appreciate advice on the packages to use to accomplish the following (i.e. pandas, or numpy, or straight python, or other packages):
Quandle files have the following data structure:
Date, Open, High, Low, Last, Change, Settle, Volume, Prev. Day Open Interest
ESH2000 data:
3/1/2000 1372.25 1388.75 1371 1384.5 60887 29558
3/2/2000 1384.25 1390.5 1372.5 1385 62489 30059
3/3/2000 1384.75 1414.5 1383.5 1410.5 65432 29923
3/6/2000 1411 1412.75 1386.25 1395 59860 29549
3/7/2000 1394.5 1404.5 1351 1351.75 85263 31256
3/8/2000 1352.75 1376 1348.5 1366 73911 30916
3/9/2000 1367 1405 1357.5 1404 7153 28164
3/10/2000 1403.25 1415.5 1394.25 1399 3192 27549
3/13/2000 1398 1415.75 1364.75 1383.75 2025 26719
3/14/2000 1380.25 1395.75 1359.5 1359.5 1207 25134
ESM2000 data:
3/1/2000 1396 1404 1389.75 1402.75 52 105
3/2/2000 1400.75 1407.75 1391.25 1402.75 91 147
3/3/2000 1400.75 1433 1400.75 1429.75 183 189
3/6/2000 1424.75 1428.75 1405.25 1415 366 379
3/7/2000 1412.5 1423 1370.25 1370.25 1160 1023
3/8/2000 1372.75 1393.75 1366.25 1384 981 1194
3/9/2000 1384.25 1423.5 1376.75 1423 49536 5974
3/10/2000 1423.25 1434.25 1412 1417.75 65615 9561
3/13/2000 1416.75 1418.25 1381.25 1401.75 69428 11559
3/14/2000 1401.25 1414.25 1376.75 1380.5 77793 12057
I want the "roll" to be done on the volume roll to the newer contract.
Edit Add:
Since the ESH2000 contract volume drops on 3/9/2000, as traders move to the ESM2000 contract, I need to "Roll" to the new contract data then, and back adjust the prior values. So in this example, the Settle value on 3/9/2000 for new contract is 1423, and on the old contract is 1404, so I want to match them. So I need to back adjust by adding 19 to all the values for the old contract.
Edit end:
Desired output would be: Date, AdjOpen, AdjHigh, AdjLow, AdjClose (Settle), Vol, OI:
3/1/2000 1391.25 1407.75 1390 1403.5 60887 29558 ADJ ESH2000 Data
3/2/2000 1403.25 1409.5 1391.5 1404 62489 30059 ADJ ESH2000 Data
3/3/2000 1403.75 1433.5 1402.5 1429.5 65432 29923 ADJ ESH2000 Data
3/6/2000 1430 1431.75 1405.25 1414 59860 29549 ADJ ESH2000 Data
3/7/2000 1413.5 1423.5 1370 1370.75 85263 31256 ADJ ESH2000 Data
3/8/2000 1371.75 1395 1367.5 1385 73911 30916 ADJ ESH2000 Data
3/9/2000 1384.25 1423.5 1376.75 1423 49536 5974 ESU2000 Data
3/10/2000 1423.25 1434.25 1412 1417.75 65615 9561 ESU2000 Data
3/13/2000 1416.75 1418.25 1381.25 1401.75 69428 11559 ESU2000 Data
3/14/2000 1401.25 1414.25 1376.75 1380.5 77793 12057 ESU2000 Data
I am reading pandas docs now, but am unsure of how to "compare" the two files/dataframes and how to go output the subset of the files and then backadjust, and then do it for each additional contract in the history.
No I am not asking anyone to do this for me, just hoping for some direction to focus my learning efforts.
Here's something rough to get you started - will take some more thought to make this more general, and likely improvements to be made.
First, I'm going to be working just the March 2000 subset of each contract, assuming each in it's own dataframe.
esm = Quandl.get("CME/ESM2000")['2000-3']
esh = Quandl.get("CME/ESH2000")['2000-3']
A combined (joined by date) view of the data can be achieved using merge
which can join dataframes on indicies and/or columns. Because date is an index in both frames, specify left_index
and right_index
combined = esh.merge(esm, left_index=True, right_index=True)
Common column names will be suffixed with an _x
(left frame) and _y
(right frame). To get a vector of True/False for if the M contract has the highest volume, just compare the two. To get the first date the volume switches over, you can index that vector with itself, and take the first index value.
m_highest = combined['Volume_y'] > combined['Volume_x']
roll_date = m_highest[m_highest].index[0]
Then compute the gap by picking out the settlement for each frame for that date.
roll_gap = esm.loc[roll_date, 'Settle'] - esh.loc[roll_date, 'Settle']
Construct the combined frame by selecting the right dates from each frame and concatenating.
df = pd.concat([esh[esh.index < roll_date], esm[esm.index >= roll_date]])
Finally, select out the earlier dates and adjust by the gap amount.
df.loc[df.index < roll_date, ['Open','High','Low','Settle']] = (
df.loc[df.index < roll_date, ['Open','High','Low','Settle']] + roll_gap)