pythonpandasdataframe

Creating start and end date - Python


I am still pretty new to pythons. I am trying to create a python code to pull the earliest extraction as start time. When the conditon becomes Y , then I would like to create an end date based on condition:

Sample Data table:

ID SUB ID Extraction Date Conditon met (Y/N)
1 x 2023-01-02 N
1 y 2023-01-02 N
1 x 2023-02-15 N
2 v 2023-02-15 N
2 z 2023-02-15 N
1 x 2023-03-15 Y
2 v 2023-03-15 Y

I have tried a few ways to pull the data but not luck:

df['start date'] = df.groupby(['ID','SUB ID'], as_index = False)['Extraction Date].transform(min).reset_index()

df['start date'] = df.groupby(['ID','SUB ID'], as_index = False)['Extraction Date].min()

df['end date'] = df.query(Condition met (Y/N) == Y).groupby(['ID','SUB ID'], as_index = False['Extraction Date].transform(max).reset_index()

df['end date'] = df.query(Condition met (Y/N) == Y).groupby(['ID','SUB ID'], as_index = False['Extraction Date].min()

What I would like to get:

ID SUB ID Extraction Date Conditon met (Y/N) Start Date End Date
1 x 2023-01-02 N 2023-01-02 N/A
1 y 2023-01-02 N 2023-01-02 N/A
1 x 2023-02-15 N 2023-01-02 N/A
2 v 2023-02-15 N 2023-02-15 N/A
2 z 2023-02-15 N 2023-02-15 N/A
1 x 2023-03-15 Y 2023-01-02 2023-03-15
2 v 2023-03-15 Y 2023-02-15 2023-03-15

Solution

  • You can use groupby and transform("min") to get the Start Date, and loc to get the End Date:

    df["Start Date"] = df.groupby(["ID", "SUB ID"])["Extraction Date"].transform("min")
    df.loc[df["Condition met (Y/N)"] == "Y", "End Date"] = df["Extraction Date"]
    
       ID SUB ID Extraction Date Condition met (Y/N) Start Date   End Date
    0   1      x      2023-01-02                   N 2023-01-02        NaT
    1   1      y      2023-01-02                   N 2023-01-02        NaT
    2   1      x      2023-02-15                   N 2023-01-02        NaT
    3   2      v      2023-02-15                   N 2023-02-15        NaT
    4   2      z      2023-02-15                   N 2023-02-15        NaT
    5   1      x      2023-03-15                   Y 2023-01-02 2023-03-15
    6   2      v      2023-03-15                   Y 2023-02-15 2023-03-15