I am still pretty new to pythons. I am trying to create a python code to pull the earliest extraction as start time. When the conditon becomes Y , then I would like to create an end date based on condition:
Sample Data table:
ID | SUB ID | Extraction Date | Conditon met (Y/N) |
---|---|---|---|
1 | x | 2023-01-02 | N |
1 | y | 2023-01-02 | N |
1 | x | 2023-02-15 | N |
2 | v | 2023-02-15 | N |
2 | z | 2023-02-15 | N |
1 | x | 2023-03-15 | Y |
2 | v | 2023-03-15 | Y |
I have tried a few ways to pull the data but not luck:
df['start date'] = df.groupby(['ID','SUB ID'], as_index = False)['Extraction Date].transform(min).reset_index()
df['start date'] = df.groupby(['ID','SUB ID'], as_index = False)['Extraction Date].min()
df['end date'] = df.query(Condition met (Y/N) == Y).groupby(['ID','SUB ID'], as_index = False['Extraction Date].transform(max).reset_index()
df['end date'] = df.query(Condition met (Y/N) == Y).groupby(['ID','SUB ID'], as_index = False['Extraction Date].min()
What I would like to get:
ID | SUB ID | Extraction Date | Conditon met (Y/N) | Start Date | End Date |
---|---|---|---|---|---|
1 | x | 2023-01-02 | N | 2023-01-02 | N/A |
1 | y | 2023-01-02 | N | 2023-01-02 | N/A |
1 | x | 2023-02-15 | N | 2023-01-02 | N/A |
2 | v | 2023-02-15 | N | 2023-02-15 | N/A |
2 | z | 2023-02-15 | N | 2023-02-15 | N/A |
1 | x | 2023-03-15 | Y | 2023-01-02 | 2023-03-15 |
2 | v | 2023-03-15 | Y | 2023-02-15 | 2023-03-15 |
You can use groupby
and transform("min")
to get the Start Date
, and loc
to get the End Date
:
df["Start Date"] = df.groupby(["ID", "SUB ID"])["Extraction Date"].transform("min")
df.loc[df["Condition met (Y/N)"] == "Y", "End Date"] = df["Extraction Date"]
ID SUB ID Extraction Date Condition met (Y/N) Start Date End Date
0 1 x 2023-01-02 N 2023-01-02 NaT
1 1 y 2023-01-02 N 2023-01-02 NaT
2 1 x 2023-02-15 N 2023-01-02 NaT
3 2 v 2023-02-15 N 2023-02-15 NaT
4 2 z 2023-02-15 N 2023-02-15 NaT
5 1 x 2023-03-15 Y 2023-01-02 2023-03-15
6 2 v 2023-03-15 Y 2023-02-15 2023-03-15