pythonpandasnumpystockohlc

How to get prior close when you have all stocks in a single DF?


Sorry for the noob question. I have a bunch of stocks in a sqlite3 database:

import pandas as pd
import sqlite3, config

connection = sqlite3.connect(config.db_file)
connection.row_factory = sqlite3.Row


df = pd.read_sql('SELECT * FROM stock_price', connection)

# sort the dataframe
df.sort_values(by='stock_id', inplace=True)

# # set the index to be this and don't drop
df.set_index(keys=['stock_id'], drop=False,inplace=True)

When I print the df, it gives me the following (where each stock_id refers to a unique stock, e.g APPL):

           id  stock_id        date    open     high     low    close   volume
stock_id                                                                      
1           1         1  2022-08-02    9.83    9.845    9.83    9.830   584772
1           2         1  2022-08-03    9.84    9.860    9.84    9.820     7711
4           3         4  2022-08-03   10.38   10.380   10.38   10.380      199
5          46         5  2022-08-03   34.75   35.200   34.75   35.200     1007
5          45         5  2022-08-02   34.32   34.550   34.32   34.442     1252
...       ...       ...         ...     ...      ...     ...      ...      ...
98          8        98  2022-08-02   28.00   28.095   27.90   28.000     2417
99         71        99  2022-08-02   88.19   88.940   87.15   88.370  1045596
99         72        99  2022-08-03   88.34   88.550   87.65   88.410   982710
100       171       100  2022-08-02  117.58  120.010  117.08  119.270    67795
100       172       100  2022-08-03  119.80  121.940  120.60  121.440     4237

[178 rows x 8 columns]

I need to target each unique stock_id individually, and get the prior close.

I know if each stock was in its own separate dataframe, I could do something like this:

final_df['previous close'] = final_df['c'].shift()

But when I've tried that, because everything in one dataframe, then you get one stock getting the previous close of an entirely different stock, which isn't what I want.

So my question:

What's the best to achieve splitting out all these different stocks from one single dataframe and being able to target them individually, and get the previous close price of each stock?


Solution

  • If I understand the question correctly, you just want the close of the previous row within each stock group. You can do that easily:

    newdf = df.assign(prev_close=df.groupby(level=0)['close'].shift())
    

    or, before setting the index to stock_id:

    newdf = df.assign(prev_close=df.groupby('stock_id')['close'].shift())
    

    On your data sample:

    >>> newdf
              id   stock_id date         open    high     low     close    volume   prev_close
    stock_id                                                                                  
    1           1    1       2022-08-02    9.83    9.845    9.83    9.830   584772     NaN    
    1           2    1       2022-08-03    9.84    9.860    9.84    9.820     7711    9.83    
    4           3    4       2022-08-03   10.38   10.380   10.38   10.380      199     NaN    
    5          46    5       2022-08-03   34.75   35.200   34.75   35.200     1007     NaN    
    5          45    5       2022-08-02   34.32   34.550   34.32   34.442     1252   35.20    
    98          8   98       2022-08-02   28.00   28.095   27.90   28.000     2417     NaN    
    99         71   99       2022-08-02   88.19   88.940   87.15   88.370  1045596     NaN    
    99         72   99       2022-08-03   88.34   88.550   87.65   88.410   982710   88.37    
    100       171  100       2022-08-02  117.58  120.010  117.08  119.270    67795     NaN    
    100       172  100       2022-08-03  119.80  121.940  120.60  121.440     4237  119.27
    

    Side note: try to not use inplace=True all over the place. It will cause you trouble in the long run (see e.g. here).

    Personally, I would prefer this format:

    cleandf = df.assign(
        date=pd.to_datetime(df['date'])
    ).set_index(['stock_id', 'date']).sort_index()
    newdf = cleandf.assign(prev_close=cleandf.groupby('stock_id')['close'].shift())
    >>> newdf
                         id   open    high     low     close    volume   prev_close
    stock_id date                                                                  
    1        2022-08-02    1    9.83    9.845    9.83    9.830   584772      NaN   
             2022-08-03    2    9.84    9.860    9.84    9.820     7711    9.830   
    4        2022-08-03    3   10.38   10.380   10.38   10.380      199      NaN   
    5        2022-08-02   45   34.32   34.550   34.32   34.442     1252      NaN   
             2022-08-03   46   34.75   35.200   34.75   35.200     1007   34.442   
    98       2022-08-02    8   28.00   28.095   27.90   28.000     2417      NaN   
    99       2022-08-02   71   88.19   88.940   87.15   88.370  1045596      NaN   
             2022-08-03   72   88.34   88.550   87.65   88.410   982710   88.370   
    100      2022-08-02  171  117.58  120.010  117.08  119.270    67795      NaN   
             2022-08-03  172  119.80  121.940  120.60  121.440     4237  119.270