pythonpandaspython-asynciobloombergblpapi

Porting excel functionality into BLPAPI/xbbg with asynchronous API calls


Hello – I am trying to port a few Excel worksheets into a python script I can run on the fly. I am querying Bloomberg data, and essentially what I am looking to do is grab historical bid and ask prices for specific ISINs on specific dates. I am using the xbbg library to query the API and working within a Jupyter Notebook. The data is being stored in a Pandas dataframe.

This is the very simple Excel functionality I am trying to model:

Excel Functionality Screenshot

I have no problem pulling this data for individual line items. However, I am trying to run this for 25,000 index constituents/dates (in Excel you can just drag and drop down the formula). I have code that allows and returns the correct data, however it takes hours to run for this many lines. I believe it is because the code is waiting for each API call to resolve before going on to the next identifier.

I have been looking for solutions to this problem, namely using a library called asyncio. I do have code snippets that will work with the syntax, but it is still taking entirely too much time. Here is my code for the asyncio version to pull this data:

async def bbg_data(isin, start_date, end_date):
    isin = '/isin/' + isin + '@BVAL'
    previous_close_date = pd.to_datetime(end_date).strftime('%Y%m%d')
    px_bid = blp.bdh(tickers=isin, flds=['PX_BID'], start_date=previous_close_date, end_date=previous_close_date)
    if not px_bid.empty:
        px_bid = px_bid.iloc[0][0]
    else:
        px_bid = "N/A"
    px_ask = blp.bdh(tickers=isin, flds=['PX_ASK'], start_date=previous_close_date, end_date=previous_close_date)
    if not px_ask.empty:
        px_ask = px_ask.iloc[0][0]
    else:
        px_ask = "N/A"
    return px_bid, px_ask

async def main():
    tasks = [asyncio.create_task(bbg_data(isin, start_date, end_date)) for isin, start_date, end_date in ttd_sample[['ISIN', 'PreviousClose', 'PreviousClose']].values]
    results = await asyncio.gather(*tasks)
    ttd_sample['previous_bid_price'] = [result[0] for result in results]
    ttd_sample['previous_ask_price'] = [result[1] for result in results]

await main()

A few questions that I have on this issue, hopefully with someone with BLPAPI/asyncio experience:

  1. Is my reliance on using the xbbg library causing me too many headaches? Would I be able to solve this issue in one shot just using blpapi? If so, are there any good tutorials for a beginner programmer?

  2. Is there an issue with how I wrote my asyncio code? I think I am close, but I am not sure if it is truly asynchronous.

Any advice on how to tackle this issue would be appreciated.


Solution

  • The setup costs of each bdh() request are high. It could be quicker to get a range of dates, and then select the one you want. As an example, here there is only one call to bdh() rather than 10:

    from xbbg import blp
    from datetime import date
    import pandas as pd
    
    pairs = [['DE0001102325 Govt',date(2023,4,3)],
             ['DE0001134922 Govt',date(2023,3,15)],
             ['DE0001134922 Govt',date(2023,4,19)],
             ['DE0001102358 Govt',date(2023,2,14)],
             ['DE0001102366 Govt',date(2023,4,12)] ]
    
    tickers = [p[0] for p in pairs]
    dates = [p[1] for p in pairs]
    
    fields = ['PX_BID','PX_ASK']
    
    min_date = min(dates)
    max_date = max(dates)
    
    df = blp.bdh(set(tickers),fields,min_date,max_date)
       
    data=[]
    index=[]
    for [ticker,dt] in pairs:
        data.append([df[ticker].loc[dt][f] for f in fields])
        index.append((ticker,dt))
    
    dfResult = pd.DataFrame(data,index=index,columns=fields)
    
    print(dfResult)
    

    With the output:

                                      PX_BID   PX_ASK
    (DE0001102325 Govt, 2023-04-03)   99.xxx   99.xxx
    (DE0001134922 Govt, 2023-03-15)  102.xxx  103.xxx
    (DE0001134922 Govt, 2023-04-19)  102.xxx  102.xxx
    (DE0001102358 Govt, 2023-02-14)   98.xxx   98.xxx
    (DE0001102366 Govt, 2023-04-12)   97.xxx   97.xxx
    

    Notes: tickers can be repeated with different dates, by using set to only pass distinct tickers to the bdh() call. Depending on the difference between min_date and max_date it may be more efficient to divide up a large number of tickers into smaller chunks with a tighter date range. 25,000 is probably too many to send in one call (you run the risk of the connection timing out), so perhaps start with blocks of 100 isin-date pairs.