I have the following table:
AppId | Id | Direction | Text | Date |
---|---|---|---|---|
aaa | 11 | in | hello | 11/2/2021 3:03:00 PM |
aaa | 22 | out | yes? | 11/2/2021 3:04:00 PM |
aaa | 33 | in | need help! | 11/3/2021 3:06:00 PM |
aaa | 44 | in | you there? | 11/4/2021 3:10:00 PM |
aaa | 55 | out | yes! | 11/5/2021 4:00:00 PM |
bb | 111 | out | welcome! | 11/6/2021 6:09:00 PM |
bb | 222 | in | can i call? | 11/6/2021 6:39:00 PM |
bb | 333 | out | sure. | 11/6/2021 8:22:00 PM |
cc | 1111 | out | hello? | 11/8/2021 2:22:00 PM |
cc | 2222 | in | Whatsup! | 11/8/2021 3:22:00 PM |
Id is a primary key, direction is basically telling us if it is an incoming message or outgoing. App id is to identify various ids that are part of a single conversation. I want to identify the first response time after an incoming message as:
AppId | Id | Direction | Text | Date | ReplyDate |
---|---|---|---|---|---|
aaa | 11 | in | hello | 11/2/2021 3:03:00 PM | 11/2/2021 3:04:00 PM |
aaa | 22 | out | yes? | 11/2/2021 3:04:00 PM | null |
aaa | 33 | in | need help! | 11/3/2021 3:06:00 PM | 11/5/2021 4:00:00 PM |
aaa | 44 | in | you there? | 11/4/2021 3:10:00 PM | 11/5/2021 4:00:00 PM |
aaa | 55 | out | yes! | 11/5/2021 4:00:00 PM | null |
bb | 111 | out | welcome! | 11/6/2021 6:09:00 PM | null |
bb | 222 | in | can i call? | 11/6/2021 6:39:00 PM | 11/6/2021 8:22:00 PM |
bb | 333 | out | sure. | 11/6/2021 8:22:00 PM | null |
cc | 1111 | out | hello? | 11/8/2021 2:22:00 PM | null |
cc | 2222 | in | Whatsup! | 11/8/2021 3:22:00 PM | null |
For any 'out' text the reply column is null, but for every 'in' text it brings the next latest timestamp for 'out'. If there is no outcoming text following an incoming text then the 'ReplyDate' for incoming is also null as in the case of 'cc'.
Can this be done in SQL. I am using vertica and it doesn't allow recursive or loop statements so I must achieve this without them.
I have been able to use lead () to get when the next one is outcoming text but not able to fill it for all previous incoming texts.
This is what I have tried so far but this doesn't get me the required result:
with cte as (
select
row_number() over(partition by AppId order by date asc) as rn,
Id,
AppId
Direction,
Text,
Date,
lead(Direction, 1) over(order by Date asc) as lead_direction,
lead(Date, 1) over (order by Date asc) as lead_date,
from table
order by Date desc)
select
Id,
AppId
Direction,
Text
Date,
case when Direction = 'Out' then null
when lead_direction = null then null
when rn <> 1 and Direction = 'In' and Direction = lead_direction then null
when rn <> 1 and Direction = 'In' and Direction <> lead_direction then lead_date
end as ReplyDate
from cte
Any help will be greatly appreciated.
Took a few attempts, but now I think I got it.
The first CTE in the WITH
clause is not part of the final query - it's just putting your original input into a self contained demo query.
The real query, and the real WITH
clause starts after that.
As the closing 'out' row for each 'in' row can come one, two, or several rows later, this can be solved with a behavioural pattern: one or more 'in' rows, followed by one 'out' row. That's what the MATCH()
clause is for.
Only rows fulfilling the pattern are returned in the query containing the clause.
The depending function PATTERN_ID()
returns the ordinal number of the found pattern within the PARTITION BY... ORDER BY
expression.
GROUPing by the PARTITION BY
columns and the PATTERN_ID
will help me get the last timestamp of the pattern, which I need for the replyts
(I changed the column names to avoid reserved words like DATE
or TEXT
).
At the end, I just need to left join the indata
CTE with the query containing the MATCH()
clause, on equality of the id
column and the direction
being equal to 'in', and that second query with the grouping query.
I left the intermediate result as an illustration of the mechanism within the CTE expressions ...
WITH
-- input from you ...
indata(AppId,Id,Direction,txt,ts) AS (
SELECT 'aaa',11,'in','hello',TIMESTAMP '11/2/2021 3:03:00 PM'
UNION ALL SELECT 'aaa',22,'out','yes?',TIMESTAMP '11/2/2021 3:04:00 PM'
UNION ALL SELECT 'aaa',33,'in','need help!',TIMESTAMP '11/3/2021 3:06:00 PM'
UNION ALL SELECT 'aaa',44,'in','you there?',TIMESTAMP '11/4/2021 3:10:00 PM'
UNION ALL SELECT 'aaa',55,'out','yes!',TIMESTAMP '11/5/2021 4:00:00 PM'
UNION ALL SELECT 'bb',111,'out','welcome!',TIMESTAMP '11/6/2021 6:09:00 PM'
UNION ALL SELECT 'bb',222,'in','can i call?',TIMESTAMP '11/6/2021 6:39:00 PM'
UNION ALL SELECT 'bb',333,'out','sure.',TIMESTAMP '11/6/2021 8:22:00 PM'
UNION ALL SELECT 'cc',1111,'out','hello?',TIMESTAMP '11/8/2021 2:22:00 PM'
UNION ALL SELECT 'cc',2222,'in','Whatsup!',TIMESTAMP '11/8/2021 3:22:00 PM'
)
-- real query starts here, replace following comma with "WITH" ...
,
-- the MATCH() clause in action - note the depending functions
-- PATTERN_ID(), MATCH_ID() and EVENT_NAME()
pattern_q AS (
SELECT
appid
, id
, direction
, txt
, ts
, PATTERN_ID()
, MATCH_ID()
, EVENT_NAME()
FROM indata
MATCH(
PARTITION BY appid ORDER BY id
DEFINE
inbound AS direction='in'
, outbound AS direction='out'
PATTERN
p AS (inbound+ outbound)
)
-- out appid | id | direction | txt | ts | PATTERN_ID | MATCH_ID | EVENT_NAME
-- out ---------+-----+-----------+-------------+---------------------+------------+----------+------------
-- out aaa | 11 | in | hello | 2021-11-02 15:03:00 | 1 | 1 | inbound
-- out aaa | 22 | out | yes? | 2021-11-02 15:04:00 | 1 | 2 | outbound
-- out aaa | 33 | in | need help! | 2021-11-03 15:06:00 | 2 | 1 | inbound
-- out aaa | 44 | in | you there? | 2021-11-04 15:10:00 | 2 | 2 | inbound
-- out aaa | 55 | out | yes! | 2021-11-05 16:00:00 | 2 | 3 | outbound
-- out bb | 222 | in | can i call? | 2021-11-06 18:39:00 | 1 | 1 | inbound
-- out bb | 333 | out | sure. | 2021-11-06 20:22:00 | 1 | 2 | outbound
)
,
-- need the last timestamp per PATTERN_ID() ... so grouping
pattern_grp AS (
SELECT
appid
, pattern_id
, MIN(ts) AS g_ts
, MAX(ts) AS replyts
FROM pattern_q
GROUP BY
appid
, pattern_id
-- out appid | pattern_id | ts | replyts
-- out -------+------------+---------------------+---------------------
-- out aaa | 1 | 2021-11-02 15:03:00 | 2021-11-02 15:04:00
-- out aaa | 2 | 2021-11-03 15:06:00 | 2021-11-05 16:00:00
-- out bb | 1 | 2021-11-06 18:39:00 | 2021-11-06 20:22:00
)
SELECT
i.*
, g.replyts
FROM indata i
LEFT JOIN pattern_q p ON i.id = p.id AND i.direction='in'
LEFT JOIN pattern_grp g ON p.appid = g.appid AND p.pattern_id = g.pattern_id
-- out Null display is "(null)".
-- out AppId | Id | Direction | txt | ts | replyts
-- out -------+------+-----------+-------------+---------------------+---------------------
-- out aaa | 11 | in | hello | 2021-11-02 15:03:00 | 2021-11-02 15:04:00
-- out aaa | 22 | out | yes? | 2021-11-02 15:04:00 | (null)
-- out aaa | 33 | in | need help! | 2021-11-03 15:06:00 | 2021-11-05 16:00:00
-- out aaa | 44 | in | you there? | 2021-11-04 15:10:00 | 2021-11-05 16:00:00
-- out aaa | 55 | out | yes! | 2021-11-05 16:00:00 | (null)
-- out bb | 111 | out | welcome! | 2021-11-06 18:09:00 | (null)
-- out bb | 222 | in | can i call? | 2021-11-06 18:39:00 | 2021-11-06 20:22:00
-- out bb | 333 | out | sure. | 2021-11-06 20:22:00 | (null)
-- out cc | 1111 | out | hello? | 2021-11-08 14:22:00 | (null)
-- out cc | 2222 | in | Whatsup! | 2021-11-08 15:22:00 | (null)