I need help with converting my python code to SQL:
req_id_mem = ""
req_workflow_mem = ""
collect_state_main = []
collect_state_temp = []
for req_id, req_datetime, req_workflow in zip(df["TICKET_ID"], df["DATETIMESTANDARD"], df["STATUS"]):
if req_id_mem == "" or req_id_mem != req_id:
req_id_mem = req_id
req_workflow_mem = ""
collect_state_temp = []
if req_workflow_mem == "" and req_workflow == "Open" and req_id_mem == req_id:
req_workflow_mem = req_workflow
collect_state_temp.append(req_id)
collect_state_temp.append(req_workflow)
collect_state_temp.append(req_datetime)
if req_workflow_mem == "Open" and req_workflow == "Closed" and req_id_mem == req_id:
req_workflow_mem = req_workflow
collect_state_temp.append(req_workflow)
collect_state_temp.append(req_datetime)
collect_state_main.append(collect_state_temp)
collect_state_temp = []
DataFrame:
TICKET_ID | DATETIMESTANDARD | STATUS |
---|---|---|
79355138 | 9/3/2024 11:54:18 AM | Open |
79355138 | 9/3/2024 9:01:12 PM | Open |
79355138 | 9/6/2024 4:52:10 PM | Closed |
79355138 | 9/6/2024 4:52:12 PM | Open |
79355138 | 9/10/2024 4:01:24 PM | Closed |
79446344 | 8/27/2024 1:32:54 PM | Open |
79446344 | 9/11/2024 9:40:17 AM | Closed |
79446344 | 9/11/2024 9:40:24 AM | Closed |
79446344 | 9/11/2024 9:42:14 AM | Open |
Result:
My problem is I'm stuck since the pairings can happen more than twice. I tried Rank in sql but it only return the first instance of pairing but not the other pairs
One way of doing it:
SELECT
FOD.TICKET_ID
, FOD.FIRSTOPENDATETIME
, (SELECT NC.DATETIMESTANDARD
from MyTbl NC -- Nearest future close date
where NC.TICKET_ID=FOD.TICKET_ID
and NC.STUS='Closed'
and NC.DATETIMESTANDARD>FOD.FIRSTOPENDATETIME
and DATETIMESTANDARD=(select min(DATETIMESTANDARD)
from MyTbl NCb
where NCb.TICKET_ID=NC.TICKET_ID
and NCb.STUS='Closed'
and NCb.DATETIMESTANDARD > FOD.FIRSTOPENDATETIME)
) as NearestFutureClosedDate
from (select TICKET_ID
, MIN(DATETIMESTANDARD) as FIRSTOPENDATETIME
from MyTbl
group by TICKET_ID) as FOD
The main query selects the earliest Open rows for each ticket, and the subquery finds the earliest Closed rows following those.
Update: I just realised that you only want the date (and no other columns) from the nearest close date row), so it can be simplifed to:
SELECT
FOD.TICKET_ID
, FOD.FIRSTOPENDATETIME
, (select min(DATETIMESTANDARD)
from #MyTbl NC
where FOD.TICKET_ID=NC.TICKET_ID
and NC.STUS='Closed'
and NC.DATETIMESTANDARD > FOD.FIRSTOPENDATETIME
) as NearestFutureClosedDate
from (select TICKET_ID
, MIN(DATETIMESTANDARD) as FIRSTOPENDATETIME
from #MyTbl
group by TICKET_ID) as FOD
;
You can also use CTEs but it doesn't mean that it will change the execution plan. Just to prove the point, I moved the subquery obtaining 'first open date' into a CTE, and compared the execution plans:
with FOD as (
select TICKET_ID
, MIN(DATETIMESTANDARD) as FIRSTOPENDATETIME
from #MyTbl
group by TICKET_ID
)
SELECT
FOD.TICKET_ID
, FOD.FIRSTOPENDATETIME
, (select min(DATETIMESTANDARD)
from #MyTbl NC
where FOD.TICKET_ID=NC.TICKET_ID
and NC.STUS='Closed'
and NC.DATETIMESTANDARD > FOD.FIRSTOPENDATETIME
) as NearestFutureClosedDate
from FOD
These are exactly the same for this example; we can have different outcomes depending on the data statistics, rdbms engine, system resources, etc, but it proves that defining some part of your query as a 'subquery' doesn't instruct the engine to 'create a table'.