sqlwindow-functionsddlverticacase-statement

Get successive row values without loop in SQL


I have the following table:

AppId Id Direction Text Date
aaa 11 in hello 11/2/2021 3:03:00 PM
aaa 22 out yes? 11/2/2021 3:04:00 PM
aaa 33 in need help! 11/3/2021 3:06:00 PM
aaa 44 in you there? 11/4/2021 3:10:00 PM
aaa 55 out yes! 11/5/2021 4:00:00 PM
bb 111 out welcome! 11/6/2021 6:09:00 PM
bb 222 in can i call? 11/6/2021 6:39:00 PM
bb 333 out sure. 11/6/2021 8:22:00 PM
cc 1111 out hello? 11/8/2021 2:22:00 PM
cc 2222 in Whatsup! 11/8/2021 3:22:00 PM

Id is a primary key, direction is basically telling us if it is an incoming message or outgoing. App id is to identify various ids that are part of a single conversation. I want to identify the first response time after an incoming message as:

AppId Id Direction Text Date ReplyDate
aaa 11 in hello 11/2/2021 3:03:00 PM 11/2/2021 3:04:00 PM
aaa 22 out yes? 11/2/2021 3:04:00 PM null
aaa 33 in need help! 11/3/2021 3:06:00 PM 11/5/2021 4:00:00 PM
aaa 44 in you there? 11/4/2021 3:10:00 PM 11/5/2021 4:00:00 PM
aaa 55 out yes! 11/5/2021 4:00:00 PM null
bb 111 out welcome! 11/6/2021 6:09:00 PM null
bb 222 in can i call? 11/6/2021 6:39:00 PM 11/6/2021 8:22:00 PM
bb 333 out sure. 11/6/2021 8:22:00 PM null
cc 1111 out hello? 11/8/2021 2:22:00 PM null
cc 2222 in Whatsup! 11/8/2021 3:22:00 PM null

For any 'out' text the reply column is null, but for every 'in' text it brings the next latest timestamp for 'out'. If there is no outcoming text following an incoming text then the 'ReplyDate' for incoming is also null as in the case of 'cc'.

Can this be done in SQL. I am using vertica and it doesn't allow recursive or loop statements so I must achieve this without them.

I have been able to use lead () to get when the next one is outcoming text but not able to fill it for all previous incoming texts.

This is what I have tried so far but this doesn't get me the required result:

with cte as (
select 
row_number() over(partition by AppId order by date asc) as rn,
Id,
AppId
Direction,
Text,
Date,
lead(Direction, 1) over(order by Date asc) as lead_direction,
lead(Date, 1) over (order by Date asc) as lead_date,
from table
order by Date desc)
select 
Id,
AppId
Direction,
Text
Date,
case when Direction = 'Out' then null
     when lead_direction = null then null
     when rn <> 1 and Direction = 'In' and Direction = lead_direction then null
     when rn <> 1 and Direction = 'In' and Direction <> lead_direction then lead_date
     end as ReplyDate
from cte

Any help will be greatly appreciated.


Solution

  • Took a few attempts, but now I think I got it. The first CTE in the WITH clause is not part of the final query - it's just putting your original input into a self contained demo query.

    The real query, and the real WITH clause starts after that.

    As the closing 'out' row for each 'in' row can come one, two, or several rows later, this can be solved with a behavioural pattern: one or more 'in' rows, followed by one 'out' row. That's what the MATCH() clause is for. Only rows fulfilling the pattern are returned in the query containing the clause.

    The depending function PATTERN_ID() returns the ordinal number of the found pattern within the PARTITION BY... ORDER BY expression.

    GROUPing by the PARTITION BY columns and the PATTERN_ID will help me get the last timestamp of the pattern, which I need for the replyts (I changed the column names to avoid reserved words like DATE or TEXT).

    At the end, I just need to left join the indata CTE with the query containing the MATCH() clause, on equality of the id column and the direction being equal to 'in', and that second query with the grouping query.

    I left the intermediate result as an illustration of the mechanism within the CTE expressions ...

    WITH
    -- input from you ...
    indata(AppId,Id,Direction,txt,ts) AS (
              SELECT 'aaa',11,'in','hello',TIMESTAMP '11/2/2021 3:03:00 PM'
    UNION ALL SELECT 'aaa',22,'out','yes?',TIMESTAMP '11/2/2021 3:04:00 PM'
    UNION ALL SELECT 'aaa',33,'in','need help!',TIMESTAMP '11/3/2021 3:06:00 PM'
    UNION ALL SELECT 'aaa',44,'in','you there?',TIMESTAMP '11/4/2021 3:10:00 PM'
    UNION ALL SELECT 'aaa',55,'out','yes!',TIMESTAMP '11/5/2021 4:00:00 PM'
    UNION ALL SELECT 'bb',111,'out','welcome!',TIMESTAMP '11/6/2021 6:09:00 PM'
    UNION ALL SELECT 'bb',222,'in','can i call?',TIMESTAMP '11/6/2021 6:39:00 PM'
    UNION ALL SELECT 'bb',333,'out','sure.',TIMESTAMP '11/6/2021 8:22:00 PM'
    UNION ALL SELECT 'cc',1111,'out','hello?',TIMESTAMP '11/8/2021 2:22:00 PM'
    UNION ALL SELECT 'cc',2222,'in','Whatsup!',TIMESTAMP '11/8/2021 3:22:00 PM'
    )
    -- real query starts here, replace following comma with "WITH" ...
    ,
    -- the MATCH() clause in action - note the depending functions
    -- PATTERN_ID(), MATCH_ID() and EVENT_NAME()
    pattern_q AS (
      SELECT 
        appid
      , id
      , direction
      , txt
      , ts
      , PATTERN_ID()
      , MATCH_ID()
      , EVENT_NAME()
      FROM indata
      MATCH(
        PARTITION BY appid ORDER BY id
        DEFINE
          inbound  AS direction='in'
        , outbound AS direction='out'
        PATTERN
          p AS (inbound+ outbound)
      )
      -- out  appid   | id  | direction |     txt     |         ts          | PATTERN_ID | MATCH_ID | EVENT_NAME 
      -- out ---------+-----+-----------+-------------+---------------------+------------+----------+------------
      -- out  aaa     |  11 | in        | hello       | 2021-11-02 15:03:00 |          1 |        1 | inbound
      -- out  aaa     |  22 | out       | yes?        | 2021-11-02 15:04:00 |          1 |        2 | outbound
      -- out  aaa     |  33 | in        | need help!  | 2021-11-03 15:06:00 |          2 |        1 | inbound
      -- out  aaa     |  44 | in        | you there?  | 2021-11-04 15:10:00 |          2 |        2 | inbound
      -- out  aaa     |  55 | out       | yes!        | 2021-11-05 16:00:00 |          2 |        3 | outbound
      -- out  bb      | 222 | in        | can i call? | 2021-11-06 18:39:00 |          1 |        1 | inbound
      -- out  bb      | 333 | out       | sure.       | 2021-11-06 20:22:00 |          1 |        2 | outbound
    )
    ,
    -- need the last timestamp per PATTERN_ID() ... so grouping
    pattern_grp AS (
      SELECT
        appid
      , pattern_id
      , MIN(ts) AS g_ts
      , MAX(ts) AS replyts
      FROM pattern_q
      GROUP BY
        appid
      , pattern_id
      -- out  appid | pattern_id |         ts          |       replyts       
      -- out -------+------------+---------------------+---------------------
      -- out  aaa   |          1 | 2021-11-02 15:03:00 | 2021-11-02 15:04:00
      -- out  aaa   |          2 | 2021-11-03 15:06:00 | 2021-11-05 16:00:00
      -- out  bb    |          1 | 2021-11-06 18:39:00 | 2021-11-06 20:22:00
    )
    SELECT
      i.*
    , g.replyts
    FROM indata           i
    LEFT JOIN pattern_q   p ON i.id = p.id       AND i.direction='in'
    LEFT JOIN pattern_grp g ON p.appid = g.appid AND p.pattern_id = g.pattern_id 
    -- out Null display is "(null)".
    -- out  AppId |  Id  | Direction |     txt     |         ts          |       replyts       
    -- out -------+------+-----------+-------------+---------------------+---------------------
    -- out  aaa   |   11 | in        | hello       | 2021-11-02 15:03:00 | 2021-11-02 15:04:00
    -- out  aaa   |   22 | out       | yes?        | 2021-11-02 15:04:00 | (null)
    -- out  aaa   |   33 | in        | need help!  | 2021-11-03 15:06:00 | 2021-11-05 16:00:00
    -- out  aaa   |   44 | in        | you there?  | 2021-11-04 15:10:00 | 2021-11-05 16:00:00
    -- out  aaa   |   55 | out       | yes!        | 2021-11-05 16:00:00 | (null)
    -- out  bb    |  111 | out       | welcome!    | 2021-11-06 18:09:00 | (null)
    -- out  bb    |  222 | in        | can i call? | 2021-11-06 18:39:00 | 2021-11-06 20:22:00
    -- out  bb    |  333 | out       | sure.       | 2021-11-06 20:22:00 | (null)
    -- out  cc    | 1111 | out       | hello?      | 2021-11-08 14:22:00 | (null)
    -- out  cc    | 2222 | in        | Whatsup!    | 2021-11-08 15:22:00 | (null)