pythonpandasfinancialtrading

Build Back Adjusted Continous Futures contract from Historical Data


I am learning Python, so need direction on how to approach this problem.

I am using the Quandl package to download historical futures data (ESH2000, ESM2000, ESU2000, ESZ2000, ESH2001, ...., ESU2014). Now I want to build the continuous back adjusted contract for plotting and backtesting.

Would appreciate advice on the packages to use to accomplish the following (i.e. pandas, or numpy, or straight python, or other packages):

Quandle files have the following data structure:

Date, Open, High, Low, Last, Change, Settle, Volume, Prev. Day Open Interest

ESH2000 data:
3/1/2000    1372.25 1388.75 1371            1384.5  60887   29558
3/2/2000    1384.25 1390.5  1372.5          1385    62489   30059
3/3/2000    1384.75 1414.5  1383.5          1410.5  65432   29923
3/6/2000    1411    1412.75 1386.25         1395    59860   29549
3/7/2000    1394.5  1404.5  1351            1351.75 85263   31256
3/8/2000    1352.75 1376    1348.5          1366    73911   30916
3/9/2000    1367    1405    1357.5          1404    7153    28164
3/10/2000   1403.25 1415.5  1394.25         1399    3192    27549
3/13/2000   1398    1415.75 1364.75         1383.75 2025    26719
3/14/2000   1380.25 1395.75 1359.5          1359.5  1207    25134


ESM2000 data:
3/1/2000    1396    1404    1389.75         1402.75 52  105
3/2/2000    1400.75 1407.75 1391.25         1402.75 91  147
3/3/2000    1400.75 1433    1400.75         1429.75 183 189
3/6/2000    1424.75 1428.75 1405.25         1415    366 379
3/7/2000    1412.5  1423    1370.25         1370.25 1160    1023
3/8/2000    1372.75 1393.75 1366.25         1384    981 1194
3/9/2000    1384.25 1423.5  1376.75         1423    49536   5974
3/10/2000   1423.25 1434.25 1412            1417.75 65615   9561
3/13/2000   1416.75 1418.25 1381.25         1401.75 69428   11559
3/14/2000   1401.25 1414.25 1376.75         1380.5  77793   12057

I want the "roll" to be done on the volume roll to the newer contract.

Edit Add:
Since the ESH2000 contract volume drops on 3/9/2000, as traders move to the ESM2000 contract, I need to "Roll" to the new contract data then, and back adjust the prior values. So in this example, the Settle value on 3/9/2000 for new contract is 1423, and on the old contract is 1404, so I want to match them. So I need to back adjust by adding 19 to all the values for the old contract.
Edit end:

Desired output would be: Date, AdjOpen, AdjHigh, AdjLow, AdjClose (Settle), Vol, OI:

3/1/2000    1391.25 1407.75 1390    1403.5  60887   29558       ADJ ESH2000 Data
3/2/2000    1403.25 1409.5  1391.5  1404    62489   30059       ADJ ESH2000 Data
3/3/2000    1403.75 1433.5  1402.5  1429.5  65432   29923       ADJ ESH2000 Data
3/6/2000    1430    1431.75 1405.25 1414    59860   29549       ADJ ESH2000 Data
3/7/2000    1413.5  1423.5  1370    1370.75 85263   31256       ADJ ESH2000 Data
3/8/2000    1371.75 1395    1367.5  1385    73911   30916       ADJ ESH2000 Data
3/9/2000    1384.25 1423.5  1376.75 1423    49536   5974        ESU2000 Data
3/10/2000   1423.25 1434.25 1412    1417.75 65615   9561        ESU2000 Data
3/13/2000   1416.75 1418.25 1381.25 1401.75 69428   11559       ESU2000 Data
3/14/2000   1401.25 1414.25 1376.75 1380.5  77793   12057       ESU2000 Data

I am reading pandas docs now, but am unsure of how to "compare" the two files/dataframes and how to go output the subset of the files and then backadjust, and then do it for each additional contract in the history.

No I am not asking anyone to do this for me, just hoping for some direction to focus my learning efforts.


Solution

  • Here's something rough to get you started - will take some more thought to make this more general, and likely improvements to be made.

    First, I'm going to be working just the March 2000 subset of each contract, assuming each in it's own dataframe.

    esm = Quandl.get("CME/ESM2000")['2000-3']
    esh = Quandl.get("CME/ESH2000")['2000-3']
    

    A combined (joined by date) view of the data can be achieved using merge which can join dataframes on indicies and/or columns. Because date is an index in both frames, specify left_index and right_index

    combined = esh.merge(esm, left_index=True, right_index=True)
    

    Common column names will be suffixed with an _x (left frame) and _y (right frame). To get a vector of True/False for if the M contract has the highest volume, just compare the two. To get the first date the volume switches over, you can index that vector with itself, and take the first index value.

    m_highest = combined['Volume_y'] > combined['Volume_x']
    roll_date = m_highest[m_highest].index[0]
    

    Then compute the gap by picking out the settlement for each frame for that date.

    roll_gap = esm.loc[roll_date, 'Settle'] - esh.loc[roll_date, 'Settle']
    

    Construct the combined frame by selecting the right dates from each frame and concatenating.

    df = pd.concat([esh[esh.index < roll_date], esm[esm.index >= roll_date]])
    

    Finally, select out the earlier dates and adjust by the gap amount.

    df.loc[df.index < roll_date, ['Open','High','Low','Settle']] = (
             df.loc[df.index < roll_date, ['Open','High','Low','Settle']] + roll_gap)