pandasnested-for-loop

Create a new columns based on multiple conditions between two different dataframes with different dimensions


I am trying to build a stock portfolio. I have mainly 2 dfs: 1 with my transactions and 1 with the stock prices of the individual stocks. My transactions df looks like this:

        Date    Ticker   Position
0   2022-11-01  MSFT     20
1   2022-11-15  PG       10
2   2022-11-25  JNJ      10
3   2022-11-22  MSFT     10

The position column indicates how many shares where purchased. So in the third row I bought a second position in MSFT, adding 10 stocks, taking my total to 30 stocks in MSFT.

My stock prices df looks like this (from yfinance):

    Ticker  Adj     Close
Date        
2022-11-01  MSFT    227.528793
2022-11-02  MSFT    219.481476
2022-11-03  MSFT    213.647903
2022-11-04  MSFT    220.767838
2022-11-07  MSFT    227.229630
... ... ...
2022-12-05  JNJ     178.779999
2022-12-06  JNJ     176.100006
2022-12-07  JNJ     177.169998
2022-12-08  JNJ     177.199997
2022-12-09  JNJ     175.740005

I would like to add a column to my stock prices df called Position which shows stock positions at that time.

I think this should not be so difficult with a double condition: If prices_date >= transaction_date AND transaction_ticker = prices_ticker THEN prices_postion = prices_position + transaction_position

I thought of initially loading an new column with all zeros, which should should allow for a simple addition (or subtraction if shares were sold)

I tried resetting the index so that column to allow for better comparison, nested for loops, but only errors occurred. I cannot wrap my head around how to do it in Python nor have I found an answer online.

Any suggestions are appreciated


Solution

  • Here's a solution that relies on merge, fillna, and groupby.cumsum():

    import pandas as pd
    import numpy as np
    
    df1 = pd.DataFrame(
        {
            "Date": pd.to_datetime(
                ["2022-11-01", "2022-11-15", "2022-12-07", "2022-11-04"]
            ),
            "Ticker": ["MSFT", "JNJ", "JNJ", "MSFT"],
            "Position": [20, 10, 10, 10],
        }
    )
    df1
    
    df2 = pd.DataFrame(
        {
            "Date": pd.to_datetime(
                [
                    "2022-11-01",
                    "2022-11-02",
                    "2022-11-03",
                    "2022-11-04",
                    "2022-11-07",
                    "2022-12-05",
                    "2022-12-06",
                    "2022-12-07",
                    "2022-12-08",
                    "2022-12-09",
                ]
            ),
            "Ticker": [
                "MSFT",
                "MSFT",
                "MSFT",
                "MSFT",
                "MSFT",
                "JNJ",
                "JNJ",
                "JNJ",
                "JNJ",
                "JNJ",
            ],
            "Close": [
                227.528793,
                219.481476,
                213.647903,
                220.767838,
                227.229630,
                178.779999,
                176.100006,
                177.169998,
                177.199997,
                175.740005,
            ],
        }
    )
    df3 = df2.merge(df1,how='outer',on=['Ticker','Date'])
    df3 = df3.sort_values(by='Date')
    df3 = df3.dropna(axis=0,subset='Close')
    df3['Position'] = df3['Position'].fillna(0)
    df3['Position'] = df3.groupby('Ticker')['Position'].cumsum()
    df3
    

    Result:

        Date    Ticker  Close   Position
    0   2022-11-01  MSFT    227.528793  20.0
    1   2022-11-02  MSFT    219.481476  20.0
    2   2022-11-03  MSFT    213.647903  20.0
    3   2022-11-04  MSFT    220.767838  30.0
    4   2022-11-07  MSFT    227.22963   30.0
    5   2022-12-05  JNJ     178.779999  0.0
    6   2022-12-06  JNJ     176.100006  0.0
    7   2022-12-07  JNJ     177.169998  10.0
    8   2022-12-08  JNJ     177.199997  10.0
    9   2022-12-09  JNJ     175.740005  10.0