I have a pandas dataframe that has the earnings date of certain stocks, the eps actual and estimate as well as revenue estimate and actual. For my sample, I only have 10 tickers of all their earnings date but I to eventually incorporate all nasdaq tickers. Anyways, what is the fastest way to go through the pandas dataframe, retrieve the specific date and symbol and pull the stock price for that day (open, high, low, close). I know how to retrieve stock prices individually from the yahoo finance api. (i.e., downloading a specific ticker and retrieving stock prices from a start date and end date) But I'm unsure of how to connect the two. Thanks.
Below is my sample df and what I would like to see...
date symbol eps epsEstimated time revenue revenueEstimated
0 2022-01-27 CMCSA 0.77 0.73 bmo 3.033600e+10 3.046110e+10
1 2021-10-28 CMCSA 0.87 0.75 bmo 3.029800e+10 2.976570e+10
2 2021-07-29 CMCSA 0.84 0.67 bmo 2.854600e+10 2.717460e+10
3 2021-04-29 CMCSA 0.76 0.59 bmo 2.720500e+10 2.680920e+10
4 2021-01-28 CMCSA 0.56 0.48 bmo 2.770800e+10 2.309000e+10
.. ... ... ... ... ... ... ...
34 2013-07-24 FB 0.19 0.14 amc 1.813000e+09 1.335895e+09
35 2013-05-01 FB 0.12 0.13 amc 1.458000e+09 1.579500e+09
36 2013-01-30 FB 0.17 0.15 amc 1.585000e+09 1.398529e+09
37 2012-10-23 FB 0.12 0.11 amc 1.262000e+09 1.156833e+09
38 2012-07-26 FB 0.12 0.12 amc 1.184000e+09 1.184000e+09
My desired result (but with values under the new columns):
date symbol eps epsEstimated revenue revenueEstimated Open High Low Clos
0 2022-01-27 CMCSA 0.77 0.73 .033600e+10 3.046110e+10
1 2021-10-28 CMCSA 0.87 0.75 3.029800e+10 2.976570e+10
2 2021-07-29 CMCSA 0.84 0.67 2.854600e+10 2.717460e+10
3 2021-04-29 CMCSA 0.76 0.59 2.720500e+10 2.680920e+10
4 2021-01-28 CMCSA 0.56 0.48 2.770800e+10 2.309000e+10
.. ... ... ... ... ... ... ...
34 2013-07-24 FB 0.19 0.14 1.813000e+09 1.335895e+09
35 2013-05-01 FB 0.12 0.13 1.458000e+09 1.579500e+09
36 2013-01-30 FB 0.17 0.15 1.585000e+09 1.398529e+09
37 2012-10-23 FB 0.12 0.11 1.262000e+09 1.156833e+09
38 2012-07-26 FB 0.12 0.12 1.184000e+09 1.184000e+09
UPDATE EDIT::This is what I have so far...
the earnings df is called data1. I created three columns Day_0, Day_1 and Day_0_close. In the time column, the value is either amc or bmo. 'amc' means after market open and 'bmo' means before market open. In order for me to analyize earnings reaction on stock price. I need to possibly readjust the dates, which is why I created those new columns. For example bmo, since earnings are released before the market opens on the current day, i need to know yesterdays date and its closing price as Day_0. For amc, i need todays date's and closing price as Day_0_close. Eventually I need to get the next day prices but just keeping it to Day_0_close for now until I can resolve this issue.
date symbol eps epsEstimated time revenue revenueEstimated Day_0 Day_1 Day_0_Close
0 2022-01-27 CMCSA 0.770000 0.7300 bmo 3.033600e+10 3.046110e+10 0.0
1 2021-10-28 CMCSA 0.870000 0.7500 bmo 3.029800e+10 2.976570e+10 0.0
2 2021-07-29 CMCSA 0.840000 0.6700 bmo 2.854600e+10 2.717460e+10 0.0
3 2021-04-29 CMCSA 0.760000 0.5900 bmo 2.720500e+10 2.680920e+10 0.0
I have another df called price1 which has all the stocks price data.
Date Open High ... Adj Close Volume ticker
0 1980-03-17 0.000000 0.101881 ... 0.070243 138396 CMCSA
1 1980-03-18 0.000000 0.101881 ... 0.070243 530518 CMCSA
2 1980-03-19 0.000000 0.100798 ... 0.069462 738113 CMCSA
3 1980-03-20 0.000000 0.108385 ... 0.074925 1360895 CMCSA
4 1980-03-21 0.000000 0.111636 ... 0.077267 461320 CMCSA
... ... ... ... ... ... ... ...
71942 2022-02-18 209.389999 210.750000 ... 206.160004 37049400 FB
71943 2022-02-22 202.339996 207.479996 ... 202.080002 39852400 FB
I then created a for loop to go through each row in data1, to pull the stock ticker and date and get prices. But now I'm getting an error "IndexError: index 0 is out of bounds for axis 0 with size 0" It debugs out at
day_0_close = price1.loc[(price1.ticker == symbol) & (price1.Date == date_0), 'Adj Close'].values[0].
I don't know why it's erroring it out when sometimes the code works but stops several rows in.
See below
date symbol eps epsEstimated time revenue revenueEstimated \
0 2022-01-27 CMCSA 0.77 0.73 bmo 3.033600e+10 3.046110e+10
1 2021-10-28 CMCSA 0.87 0.75 bmo 3.029800e+10 2.976570e+10
2 2021-07-29 CMCSA 0.84 0.67 bmo 2.854600e+10 2.717460e+10
3 2021-04-29 CMCSA 0.76 0.59 bmo 2.720500e+10 2.680920e+10
4 2021-01-28 CMCSA 0.56 0.48 bmo 2.770800e+10 2.309000e+10
Day_0 Day_1 Day_0_Close
0 2022-01-26 2022-01-27 48.459999
1 2021-10-27 2021-10-28 0.000000
2 0.000000
Here is what i have so far on my for loop
for idx, row in data1.iterrows():
orig_day = pd.to_datetime(row['date'])
temp_day = orig_day + pd.tseries.offsets.CustomBusinessDay(1, holidays=nyse.holidays().holidays)
prev_temp_day = orig_day - pd.tseries.offsets.CustomBusinessDay(1, holidays=nyse.holidays().holidays)
if row['time'] == 'amc':
data1.at[idx, 'Day_0'] = orig_day.strftime("%Y-%m-%d")
data1.at[idx, 'Day_1'] = temp_day.strftime("%Y-%m-%d")
else:
data1.at[idx, 'Day_0'] = prev_temp_day.strftime("%Y-%m-%d")
data1.at[idx, 'Day_1'] = orig_day.strftime("%Y-%m-%d")
symbol = row['symbol']
date_0 = row['Day_0']
date_1 = row['Day_1']
day_0_close = price1.loc[(price1.ticker == symbol) & (price1.Date == date_0), 'Adj Close'].values[0]
print(day_0_close)
data1.at[idx, 'Day_0_Close'] = day_0_close
Thank you for any help you can give
This solution involves data collection as well, feel free to use this feature or just adapt the data merging using that specific part of the code.
First, setting up the dataframe to test this solution:
df = pd.DataFrame({'Date':['2022-01-27','2021-10-28','2021-07-29','2021-04-29','2021-01-28','2013-07-24','2013-05-01','2013-01-30','2012-10-23','2012-07-26'],
'symbol':['CMCSA','CMCSA','CMCSA','CMCSA','CMCSA','FB','FB','FB','FB','FB'],
'eps' :[0.77,0.87,0.84,0.76,0.56,0.19,0.12,0.17,0.12,0.12],
'epsEstimated' :[0.73,0.75,0.67,0.59,0.48,0.14,0.13,0.15,0.11,0.12],
'time' :['bmo','bmo','bmo','bmo','bmo','amc','amc','amc','amc','amc'],
'revenue' :[3.033600e+10,3.029800e+10,2.854600e+10,2.720500e+10,2.770800e+10,1.813000e+09,1.458000e+09,1.585000e+09,1.262000e+09,1.184000e+09],
'revenueEstimated':[3.046110e+10,3.046110e+10,2.717460e+10,2.680920e+10,2.309000e+10,1.335895e+09,1.579500e+09,1.398529e+09,1.156833e+09,1.184000e+09]})
df['Date'] = pd.to_datetime(df['Date'])
Please notice I named the Date
column with a capital D
.
df
Date symbol eps epsEstimated time revenue revenueEstimated
0 2022-01-27 CMCSA 0.77 0.73 bmo 3.033600e+10 3.046110e+10
1 2021-10-28 CMCSA 0.87 0.75 bmo 3.029800e+10 3.046110e+10
2 2021-07-29 CMCSA 0.84 0.67 bmo 2.854600e+10 2.717460e+10
3 2021-04-29 CMCSA 0.76 0.59 bmo 2.720500e+10 2.680920e+10
4 2021-01-28 CMCSA 0.56 0.48 bmo 2.770800e+10 2.309000e+10
5 2013-07-24 FB 0.19 0.14 amc 1.813000e+09 1.335895e+09
6 2013-05-01 FB 0.12 0.13 amc 1.458000e+09 1.579500e+09
7 2013-01-30 FB 0.17 0.15 amc 1.585000e+09 1.398529e+09
8 2012-10-23 FB 0.12 0.11 amc 1.262000e+09 1.156833e+09
9 2012-07-26 FB 0.12 0.12 amc 1.184000e+09 1.184000e+09
Downloading your database with OHLC information:
import yfinance as yf
df_ohlc = yf.download(df['symbol'].unique().tolist(), start=df['Date'].min())[['Open','High','Low','Close']]
df_ohlc
Output (could not format it properly using text, hence the figure):
Now, we stack the symbol level index, rename it and reset all indexes, we want both the symbol
and Date
index as columns, so we can properly merge the data:
df_ohlc = df_ohlc.stack(level=1).reset_index().rename(columns={'level_1':'symbol'})
data1 = df.merge(df_ohlc, how='inner', on=['Date','symbol'])
Output:
data1
Date symbol eps epsEstimated time revenue revenueEstimated Close High Low Open
0 2022-01-27 CMCSA 0.77 0.73 bmo 3.033600e+10 3.046110e+10 48.009998 50.070000 45.470001 45.470001
1 2021-10-28 CMCSA 0.87 0.75 bmo 3.029800e+10 3.046110e+10 51.900002 52.740002 49.799999 50.400002
2 2021-07-29 CMCSA 0.84 0.67 bmo 2.854600e+10 2.717460e+10 58.110001 59.700001 58.060001 59.200001
3 2021-04-29 CMCSA 0.76 0.59 bmo 2.720500e+10 2.680920e+10 56.400002 56.490002 55.279999 55.980000
4 2021-01-28 CMCSA 0.56 0.48 bmo 2.770800e+10 2.309000e+10 51.599998 52.290001 49.779999 50.000000
5 2013-07-24 FB 0.19 0.14 amc 1.813000e+09 1.335895e+09 26.510000 26.530001 26.049999 26.320000
6 2013-05-01 FB 0.12 0.13 amc 1.458000e+09 1.579500e+09 27.430000 27.920000 27.309999 27.850000
7 2013-01-30 FB 0.17 0.15 amc 1.585000e+09 1.398529e+09 31.240000 31.490000 30.879999 30.980000
8 2012-10-23 FB 0.12 0.11 amc 1.262000e+09 1.156833e+09 19.500000 19.799999 19.100000 19.250000
9 2012-07-26 FB 0.12 0.12 amc 1.184000e+09 1.184000e+09 26.850000 28.230000 26.730000 27.750000
Done: we got your corresponding OHLC values avoiding any kind of loops.