python sql pandas snowflake-cloud-data-platform

Converting python logic to sql query (Pairing two status from one column)

I need help with converting my python code to SQL:

req_id_mem = ""
req_workflow_mem = ""

collect_state_main = []
collect_state_temp = []

for req_id, req_datetime, req_workflow in zip(df["TICKET_ID"], df["DATETIMESTANDARD"], df["STATUS"]):
    if req_id_mem == "" or req_id_mem != req_id:
        req_id_mem = req_id
        req_workflow_mem = ""
        collect_state_temp = []
    
    if req_workflow_mem == "" and req_workflow == "Open" and req_id_mem == req_id:
        req_workflow_mem = req_workflow
        collect_state_temp.append(req_id)
        collect_state_temp.append(req_workflow)
        collect_state_temp.append(req_datetime)
        
    if req_workflow_mem == "Open" and req_workflow == "Closed" and req_id_mem == req_id:
        req_workflow_mem = req_workflow
        collect_state_temp.append(req_workflow)
        collect_state_temp.append(req_datetime)
        collect_state_main.append(collect_state_temp)
        collect_state_temp = []

DataFrame:

TICKET_ID	DATETIMESTANDARD	STATUS
79355138	9/3/2024 11:54:18 AM	Open
79355138	9/3/2024 9:01:12 PM	Open
79355138	9/6/2024 4:52:10 PM	Closed
79355138	9/6/2024 4:52:12 PM	Open
79355138	9/10/2024 4:01:24 PM	Closed
79446344	8/27/2024 1:32:54 PM	Open
79446344	9/11/2024 9:40:17 AM	Closed
79446344	9/11/2024 9:40:24 AM	Closed
79446344	9/11/2024 9:42:14 AM	Open

Result:

It will Identify the first Open State of a TICKET_ID and look for the closest Closed Status
It will reiterate for each case to look for an Open and Closed pair (first open and first close will only be considered)

My problem is I'm stuck since the pairings can happen more than twice. I tried Rank in sql but it only return the first instance of pairing but not the other pairs

Adding also my solution to this one as I migrated to snowflake recently:

SELECT
    FOD.TICKET_ID,
    FOD.FIRSTOPENDATETIME AS OPEN_DATETIME,
    MIN(NC.DATETIMESTANDARD) AS CLOSED_DATETIME
FROM
    (
    SELECT
        TICKET_ID,
        MIN(DATETIMESTANDARD) AS FIRSTOPENDATETIME,
        STATUS
    FROM
        DB.TABLE
    WHERE
        (
        (STATUS IN ('Open') AND EVENT_TYPE IN ('Ticket Open'))
        OR STATUS IN ('Closed')
        )
    GROUP BY
        TICKET_ID, STATUS
    ) AS FOD
LEFT JOIN DB.TABLE AS NC ON FOD.TICKET_ID = NC.TICKET_ID AND NC.STATUS = 'Closed' AND NC.DATETIMESTANDARD > FOD.FIRSTOPENDATETIME
WHERE
FOD.STATUS = 'Open'
GROUP BY
FOD.TICKET_ID, FOD.FIRSTOPENDATETIME
ORDER BY
FOD.TICKET_ID ASC, FOD.FIRSTOPENDATETIME ASC

Solution

One way of doing it:

SELECT 
  FOD.TICKET_ID
  , FOD.FIRSTOPENDATETIME
  , (SELECT NC.DATETIMESTANDARD 
     from MyTbl NC -- Nearest future close date
     where NC.TICKET_ID=FOD.TICKET_ID
       and NC.STUS='Closed'
       and NC.DATETIMESTANDARD>FOD.FIRSTOPENDATETIME
       and DATETIMESTANDARD=(select min(DATETIMESTANDARD)
                             from MyTbl NCb
                             where NCb.TICKET_ID=NC.TICKET_ID
                               and NCb.STUS='Closed'
                               and NCb.DATETIMESTANDARD > FOD.FIRSTOPENDATETIME)
      ) as NearestFutureClosedDate
from  (select  TICKET_ID
             , MIN(DATETIMESTANDARD) as FIRSTOPENDATETIME
      from MyTbl
      group by TICKET_ID) as FOD

The main query selects the earliest Open rows for each ticket, and the subquery finds the earliest Closed rows following those.

Update: I just realised that you only want the date (and no other columns) from the nearest close date row), so it can be simplifed to:

SELECT 
  FOD.TICKET_ID
  , FOD.FIRSTOPENDATETIME
  , (select min(DATETIMESTANDARD)
     from #MyTbl NC
     where FOD.TICKET_ID=NC.TICKET_ID
     and NC.STUS='Closed'
     and NC.DATETIMESTANDARD > FOD.FIRSTOPENDATETIME
      ) as NearestFutureClosedDate
from  (select  TICKET_ID
             , MIN(DATETIMESTANDARD) as FIRSTOPENDATETIME
      from #MyTbl
      group by TICKET_ID) as FOD

;

You can also use CTEs but it doesn't mean that it will change the execution plan. Just to prove the point, I moved the subquery obtaining 'first open date' into a CTE, and compared the execution plans:

with FOD as (
     select  TICKET_ID
             , MIN(DATETIMESTANDARD) as FIRSTOPENDATETIME
      from #MyTbl
      group by TICKET_ID
)
SELECT 
  FOD.TICKET_ID
  , FOD.FIRSTOPENDATETIME
  , (select min(DATETIMESTANDARD)
     from #MyTbl NC
     where FOD.TICKET_ID=NC.TICKET_ID
     and NC.STUS='Closed'
     and NC.DATETIMESTANDARD > FOD.FIRSTOPENDATETIME
      ) as NearestFutureClosedDate
from  FOD

These are exactly the same for this example; we can have different outcomes depending on the data statistics, rdbms engine, system resources, etc, but it proves that defining some part of your query as a 'subquery' doesn't instruct the engine to 'create a table'.