I have this dataset which has a log of what happened to an issue within given time. I would like to label each status that it went to what level it has been during that time. Im doing it on python using pandas.
EDIT: Adding that each log is tied to a project and I have multiple project with different log. The below example is for 1 project only.
Date | Field | Old Value | New Value |
---|---|---|---|
2024-05-16 20:03:00 | Status | Development | QA |
2024-05-15 01:30:00 | Level | 3 | 2 |
2024-05-14 20:03:00 | Status | Work In Progress | Development |
2024-05-12 20:03:00 | Status | Investigate | Work in Progress |
2024-05-08 01:30:00 | Level | 4 | 3 |
2024-05-06 18:57:00 | Status | Scrub | Investigate |
2024-04-30 00:18:00 | Level | N/A | 4 |
This is the output I want:
Date | Field | Old Value | New Value | Level Spent |
---|---|---|---|---|
2024-05-16 20:03:00 | Status | Development | QA | 2 |
2024-05-15 01:30:00 | Level | 3 | 2 | 2 |
2024-05-14 20:03:00 | Status | Work In Progress | Development | 3 |
2024-05-12 20:03:00 | Status | Investigate | Work in Progress | 3 |
2024-05-08 01:30:00 | Level | 4 | 3 | 3 |
2024-05-06 18:57:00 | Status | Scrub | Investigate | 4 |
2024-04-30 00:18:00 | Level | N/A | 4 | 4 |
IIUC, sort the values by Date, select the values where
Field is Level, then ffill
:
df['Level Spent'] = (df.sort_values(by='Date')['New Value']
.where(df['Field'].eq('Level')).ffill()
)
Output:
Date Field Old Value New Value Level Spent
0 2024-05-16 20:03:00 Status Development QA 2
1 2024-05-15 01:30:00 Level 3 2 2
2 2024-05-14 20:03:00 Status Work In Progress Development 3
3 2024-05-12 20:03:00 Status Investigate Work in Progress 3
4 2024-05-08 01:30:00 Level 4 3 3
5 2024-05-06 18:57:00 Status Scrub Investigate 4
6 2024-04-30 00:18:00 Level NaN 4 4