pythonpandasstockyfinance

retrieve stock price data from specific dates in pandas df


I have a pandas dataframe that has the earnings date of certain stocks, the eps actual and estimate as well as revenue estimate and actual. For my sample, I only have 10 tickers of all their earnings date but I to eventually incorporate all nasdaq tickers. Anyways, what is the fastest way to go through the pandas dataframe, retrieve the specific date and symbol and pull the stock price for that day (open, high, low, close). I know how to retrieve stock prices individually from the yahoo finance api. (i.e., downloading a specific ticker and retrieving stock prices from a start date and end date) But I'm unsure of how to connect the two. Thanks.

Below is my sample df and what I would like to see...

          date symbol   eps  epsEstimated time       revenue  revenueEstimated
0   2022-01-27  CMCSA  0.77          0.73  bmo  3.033600e+10      3.046110e+10
1   2021-10-28  CMCSA  0.87          0.75  bmo  3.029800e+10      2.976570e+10
2   2021-07-29  CMCSA  0.84          0.67  bmo  2.854600e+10      2.717460e+10
3   2021-04-29  CMCSA  0.76          0.59  bmo  2.720500e+10      2.680920e+10
4   2021-01-28  CMCSA  0.56          0.48  bmo  2.770800e+10      2.309000e+10
..         ...    ...   ...           ...  ...           ...               ...
34  2013-07-24     FB  0.19          0.14  amc  1.813000e+09      1.335895e+09
35  2013-05-01     FB  0.12          0.13  amc  1.458000e+09      1.579500e+09
36  2013-01-30     FB  0.17          0.15  amc  1.585000e+09      1.398529e+09
37  2012-10-23     FB  0.12          0.11  amc  1.262000e+09      1.156833e+09
38  2012-07-26     FB  0.12          0.12  amc  1.184000e+09      1.184000e+09

My desired result (but with values under the new columns):

          date symbol   eps  epsEstimated  revenue       revenueEstimated Open High Low Clos 
0   2022-01-27  CMCSA  0.77          0.73  .033600e+10      3.046110e+10     
1   2021-10-28  CMCSA  0.87          0.75  3.029800e+10      2.976570e+10
2   2021-07-29  CMCSA  0.84          0.67  2.854600e+10      2.717460e+10
3   2021-04-29  CMCSA  0.76          0.59  2.720500e+10      2.680920e+10
4   2021-01-28  CMCSA  0.56          0.48  2.770800e+10      2.309000e+10
..         ...    ...   ...           ...  ...           ...               ...
34  2013-07-24     FB  0.19          0.14  1.813000e+09      1.335895e+09
35  2013-05-01     FB  0.12          0.13  1.458000e+09      1.579500e+09
36  2013-01-30     FB  0.17          0.15  1.585000e+09      1.398529e+09
37  2012-10-23     FB  0.12          0.11  1.262000e+09      1.156833e+09
38  2012-07-26     FB  0.12          0.12  1.184000e+09      1.184000e+09


UPDATE EDIT::This is what I have so far...

the earnings df is called data1. I created three columns Day_0, Day_1 and Day_0_close. In the time column, the value is either amc or bmo. 'amc' means after market open and 'bmo' means before market open. In order for me to analyize earnings reaction on stock price. I need to possibly readjust the dates, which is why I created those new columns. For example bmo, since earnings are released before the market opens on the current day, i need to know yesterdays date and its closing price as Day_0. For amc, i need todays date's and closing price as Day_0_close. Eventually I need to get the next day prices but just keeping it to Day_0_close for now until I can resolve this issue.


         date symbol        eps  epsEstimated time       revenue  revenueEstimated Day_0 Day_1  Day_0_Close
0  2022-01-27  CMCSA   0.770000        0.7300  bmo  3.033600e+10      3.046110e+10                      0.0
1  2021-10-28  CMCSA   0.870000        0.7500  bmo  3.029800e+10      2.976570e+10                      0.0
2  2021-07-29  CMCSA   0.840000        0.6700  bmo  2.854600e+10      2.717460e+10                      0.0
3  2021-04-29  CMCSA   0.760000        0.5900  bmo  2.720500e+10      2.680920e+10                      0.0

I have another df called price1 which has all the stocks price data.

            Date        Open        High  ...   Adj Close    Volume  ticker
0     1980-03-17    0.000000    0.101881  ...    0.070243    138396   CMCSA
1     1980-03-18    0.000000    0.101881  ...    0.070243    530518   CMCSA
2     1980-03-19    0.000000    0.100798  ...    0.069462    738113   CMCSA
3     1980-03-20    0.000000    0.108385  ...    0.074925   1360895   CMCSA
4     1980-03-21    0.000000    0.111636  ...    0.077267    461320   CMCSA
...          ...         ...         ...  ...         ...       ...     ...
71942 2022-02-18  209.389999  210.750000  ...  206.160004  37049400      FB
71943 2022-02-22  202.339996  207.479996  ...  202.080002  39852400      FB

I then created a for loop to go through each row in data1, to pull the stock ticker and date and get prices. But now I'm getting an error "IndexError: index 0 is out of bounds for axis 0 with size 0" It debugs out at

day_0_close = price1.loc[(price1.ticker == symbol) & (price1.Date == date_0), 'Adj Close'].values[0]. 

I don't know why it's erroring it out when sometimes the code works but stops several rows in.

See below

        date symbol   eps  epsEstimated time       revenue  revenueEstimated  \
0 2022-01-27  CMCSA  0.77          0.73  bmo  3.033600e+10      3.046110e+10   
1 2021-10-28  CMCSA  0.87          0.75  bmo  3.029800e+10      2.976570e+10   
2 2021-07-29  CMCSA  0.84          0.67  bmo  2.854600e+10      2.717460e+10   
3 2021-04-29  CMCSA  0.76          0.59  bmo  2.720500e+10      2.680920e+10   
4 2021-01-28  CMCSA  0.56          0.48  bmo  2.770800e+10      2.309000e+10   

        Day_0       Day_1  Day_0_Close  
0  2022-01-26  2022-01-27    48.459999  
1  2021-10-27  2021-10-28     0.000000  
2                             0.000000

Here is what i have so far on my for loop

for idx, row in data1.iterrows():

    orig_day = pd.to_datetime(row['date'])


    temp_day = orig_day + pd.tseries.offsets.CustomBusinessDay(1, holidays=nyse.holidays().holidays)
    prev_temp_day = orig_day - pd.tseries.offsets.CustomBusinessDay(1, holidays=nyse.holidays().holidays)

    if row['time'] == 'amc':
        data1.at[idx, 'Day_0'] = orig_day.strftime("%Y-%m-%d")
        data1.at[idx, 'Day_1'] = temp_day.strftime("%Y-%m-%d")
    else:
        data1.at[idx, 'Day_0'] = prev_temp_day.strftime("%Y-%m-%d")
        data1.at[idx, 'Day_1'] = orig_day.strftime("%Y-%m-%d")


    symbol = row['symbol']

    date_0 = row['Day_0']
    date_1 = row['Day_1']

    day_0_close = price1.loc[(price1.ticker == symbol) & (price1.Date == date_0), 'Adj Close'].values[0]
    print(day_0_close)

    data1.at[idx, 'Day_0_Close'] = day_0_close

Thank you for any help you can give


Solution

  • This solution involves data collection as well, feel free to use this feature or just adapt the data merging using that specific part of the code.

    First, setting up the dataframe to test this solution:

    df = pd.DataFrame({'Date':['2022-01-27','2021-10-28','2021-07-29','2021-04-29','2021-01-28','2013-07-24','2013-05-01','2013-01-30','2012-10-23','2012-07-26'],
                       'symbol':['CMCSA','CMCSA','CMCSA','CMCSA','CMCSA','FB','FB','FB','FB','FB'],
                       'eps'             :[0.77,0.87,0.84,0.76,0.56,0.19,0.12,0.17,0.12,0.12],
                       'epsEstimated'    :[0.73,0.75,0.67,0.59,0.48,0.14,0.13,0.15,0.11,0.12],
                       'time'            :['bmo','bmo','bmo','bmo','bmo','amc','amc','amc','amc','amc'],
                       'revenue'         :[3.033600e+10,3.029800e+10,2.854600e+10,2.720500e+10,2.770800e+10,1.813000e+09,1.458000e+09,1.585000e+09,1.262000e+09,1.184000e+09],
                       'revenueEstimated':[3.046110e+10,3.046110e+10,2.717460e+10,2.680920e+10,2.309000e+10,1.335895e+09,1.579500e+09,1.398529e+09,1.156833e+09,1.184000e+09]})
    df['Date'] = pd.to_datetime(df['Date'])
    

    Please notice I named the Date column with a capital D.

    df
              Date  symbol   eps    epsEstimated    time     revenue revenueEstimated
    0   2022-01-27   CMCSA  0.77            0.73    bmo 3.033600e+10     3.046110e+10
    1   2021-10-28   CMCSA  0.87            0.75    bmo 3.029800e+10     3.046110e+10
    2   2021-07-29   CMCSA  0.84            0.67    bmo 2.854600e+10     2.717460e+10
    3   2021-04-29   CMCSA  0.76            0.59    bmo 2.720500e+10     2.680920e+10
    4   2021-01-28   CMCSA  0.56            0.48    bmo 2.770800e+10     2.309000e+10
    5   2013-07-24      FB  0.19            0.14    amc 1.813000e+09     1.335895e+09
    6   2013-05-01      FB  0.12            0.13    amc 1.458000e+09     1.579500e+09
    7   2013-01-30      FB  0.17            0.15    amc 1.585000e+09     1.398529e+09
    8   2012-10-23      FB  0.12            0.11    amc 1.262000e+09     1.156833e+09
    9   2012-07-26      FB  0.12            0.12    amc 1.184000e+09     1.184000e+09
    

    Downloading your database with OHLC information:

    import yfinance as yf
    
    df_ohlc = yf.download(df['symbol'].unique().tolist(), start=df['Date'].min())[['Open','High','Low','Close']]
    df_ohlc
    

    Output (could not format it properly using text, hence the figure):

    df_ohlc_figure_output

    Now, we stack the symbol level index, rename it and reset all indexes, we want both the symbol and Date index as columns, so we can properly merge the data:

    df_ohlc = df_ohlc.stack(level=1).reset_index().rename(columns={'level_1':'symbol'})
    data1 = df.merge(df_ohlc, how='inner', on=['Date','symbol'])
    

    Output:

    data1
    
              Date symbol    eps epsEstimated   time         revenue    revenueEstimated    Close        High         Low        Open
    0   2022-01-27  CMCSA   0.77    0.73         bmo    3.033600e+10    3.046110e+10    48.009998   50.070000   45.470001   45.470001
    1   2021-10-28  CMCSA   0.87    0.75         bmo    3.029800e+10    3.046110e+10    51.900002   52.740002   49.799999   50.400002
    2   2021-07-29  CMCSA   0.84    0.67         bmo    2.854600e+10    2.717460e+10    58.110001   59.700001   58.060001   59.200001
    3   2021-04-29  CMCSA   0.76    0.59         bmo    2.720500e+10    2.680920e+10    56.400002   56.490002   55.279999   55.980000
    4   2021-01-28  CMCSA   0.56    0.48         bmo    2.770800e+10    2.309000e+10    51.599998   52.290001   49.779999   50.000000
    5   2013-07-24     FB   0.19    0.14         amc    1.813000e+09    1.335895e+09    26.510000   26.530001   26.049999   26.320000
    6   2013-05-01     FB   0.12    0.13         amc    1.458000e+09    1.579500e+09    27.430000   27.920000   27.309999   27.850000
    7   2013-01-30     FB   0.17    0.15         amc    1.585000e+09    1.398529e+09    31.240000   31.490000   30.879999   30.980000
    8   2012-10-23     FB   0.12    0.11         amc    1.262000e+09    1.156833e+09    19.500000   19.799999   19.100000   19.250000
    9   2012-07-26     FB   0.12    0.12         amc    1.184000e+09    1.184000e+09    26.850000   28.230000   26.730000   27.750000
    

    Done: we got your corresponding OHLC values avoiding any kind of loops.