Hello – I am trying to port a few Excel worksheets into a python script I can run on the fly. I am querying Bloomberg data, and essentially what I am looking to do is grab historical bid and ask prices for specific ISINs on specific dates. I am using the xbbg library to query the API and working within a Jupyter Notebook. The data is being stored in a Pandas dataframe.
This is the very simple Excel functionality I am trying to model:
Excel Functionality Screenshot
I have no problem pulling this data for individual line items. However, I am trying to run this for 25,000 index constituents/dates (in Excel you can just drag and drop down the formula). I have code that allows and returns the correct data, however it takes hours to run for this many lines. I believe it is because the code is waiting for each API call to resolve before going on to the next identifier.
I have been looking for solutions to this problem, namely using a library called asyncio. I do have code snippets that will work with the syntax, but it is still taking entirely too much time. Here is my code for the asyncio version to pull this data:
async def bbg_data(isin, start_date, end_date):
isin = '/isin/' + isin + '@BVAL'
previous_close_date = pd.to_datetime(end_date).strftime('%Y%m%d')
px_bid = blp.bdh(tickers=isin, flds=['PX_BID'], start_date=previous_close_date, end_date=previous_close_date)
if not px_bid.empty:
px_bid = px_bid.iloc[0][0]
else:
px_bid = "N/A"
px_ask = blp.bdh(tickers=isin, flds=['PX_ASK'], start_date=previous_close_date, end_date=previous_close_date)
if not px_ask.empty:
px_ask = px_ask.iloc[0][0]
else:
px_ask = "N/A"
return px_bid, px_ask
async def main():
tasks = [asyncio.create_task(bbg_data(isin, start_date, end_date)) for isin, start_date, end_date in ttd_sample[['ISIN', 'PreviousClose', 'PreviousClose']].values]
results = await asyncio.gather(*tasks)
ttd_sample['previous_bid_price'] = [result[0] for result in results]
ttd_sample['previous_ask_price'] = [result[1] for result in results]
await main()
A few questions that I have on this issue, hopefully with someone with BLPAPI/asyncio experience:
Is my reliance on using the xbbg library causing me too many headaches? Would I be able to solve this issue in one shot just using blpapi? If so, are there any good tutorials for a beginner programmer?
Is there an issue with how I wrote my asyncio code? I think I am close, but I am not sure if it is truly asynchronous.
Any advice on how to tackle this issue would be appreciated.
The setup costs of each bdh()
request are high. It could be quicker to get a range of dates, and then select the one you want. As an example, here there is only one call to bdh()
rather than 10:
from xbbg import blp
from datetime import date
import pandas as pd
pairs = [['DE0001102325 Govt',date(2023,4,3)],
['DE0001134922 Govt',date(2023,3,15)],
['DE0001134922 Govt',date(2023,4,19)],
['DE0001102358 Govt',date(2023,2,14)],
['DE0001102366 Govt',date(2023,4,12)] ]
tickers = [p[0] for p in pairs]
dates = [p[1] for p in pairs]
fields = ['PX_BID','PX_ASK']
min_date = min(dates)
max_date = max(dates)
df = blp.bdh(set(tickers),fields,min_date,max_date)
data=[]
index=[]
for [ticker,dt] in pairs:
data.append([df[ticker].loc[dt][f] for f in fields])
index.append((ticker,dt))
dfResult = pd.DataFrame(data,index=index,columns=fields)
print(dfResult)
With the output:
PX_BID PX_ASK
(DE0001102325 Govt, 2023-04-03) 99.xxx 99.xxx
(DE0001134922 Govt, 2023-03-15) 102.xxx 103.xxx
(DE0001134922 Govt, 2023-04-19) 102.xxx 102.xxx
(DE0001102358 Govt, 2023-02-14) 98.xxx 98.xxx
(DE0001102366 Govt, 2023-04-12) 97.xxx 97.xxx
Notes: tickers can be repeated with different dates, by using set
to only pass distinct tickers to the bdh()
call. Depending on the difference between min_date
and max_date
it may be more efficient to divide up a large number of tickers into smaller chunks with a tighter date range. 25,000 is probably too many to send in one call (you run the risk of the connection timing out), so perhaps start with blocks of 100 isin-date pairs.