I'm having some difficulty understanding how to leverage Vertica's match clause to identify sessions in which a user searched for something on our site (event_category ='Search') and then saw a product carousel item (product_list ='banner' AND event_action ='impression').
Varying events are captured before, after, and during the pattern I'd like to identify, as the number of products that appear on a page and a user's engagement with our site vary can from session to session and user to user.
Raw Data Example
| hit_number | product_list | Event_Category | Event_Action | Event_Label |
|------------|----------------------|----------------|--------------|---------------|
| 105 | (null) | Search | Submit | chocolate |
| 106 | (null) | eec | impression | search-result |
| 107 | search-result | eec | impression | sendData |
| 107 | search-result | eec | impression | sendData |
| 107 | search-result | eec | impression | sendData |
| 107 | search-result | eec | impression | sendData |
| 108 | (null) | (null) | (null) | (null) |
| 109 | (null) | eec | impression | banner |
| 110 | banner-105-chocolate | eec | impression | sendData |
| 110 | banner-105-chocolate | eec | impression | sendData |
| 110 | banner-105-chocolate | eec | impression | sendData |
For the pattern to be valid, there must be at least 1 search event and 1 banner impression, I've set the pattern to (Search+ Banner+) to reflect this, but I'm not returning any results when I run execute the SQL query shown below.
SELECT
page_title
,event_label
,event_name()
,match_id()
,pattern_id()
FROM
(SELECT
unique_visit_id
,hit_number
,event_category
,event_label
,event_action
,product_list
FROM
atomic.ga_sessions_hits_product_expanded
WHERE
1=1
AND ga_sessions_date >= CURRENT_DATE -3
AND unique_visit_id = 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'
ORDER BY
hit_number ASC) base
Match
(Partition by unique_visit_id Order by hit_number
Define
Search as event_category ='Search' and event_action = 'Submit',
Banner as product_list ilike 'banner-%' and event_action ='impression'
Pattern
P as (Search+ BannerImpression+)
ROWS MATCH FIRST EVENT)
Please let me know if there's anything I should clarify, any insights or assistance would be greatly appreciated!
First, the column you're partitioning by is not in the example input. I added it and gave it the value 42 for all rows in your input data.
Your problem is that there are no patterns, in that data snippet, where an event that you named banner
immediately follows an event that you named search
I added yet another event into the DEFINE clause, at the end. If the other two don't evaluate to true, the last, which is just defined as other AS true
, will be picked (that's the behaviour of ROWS MATCH FIRST EVENT) .
And the pattern then becomes (search+ other* banner+)
, and that one is then found.
See here:
WITH
ga_sessions_hits_product_expanded(
unique_visit_id,hit_number,product_list,Event_Category,Event_Action,Event_Label
) AS (
SELECT 42,105,NULL,'Search','Submit','chocolate'
UNION ALL SELECT 42,106,NULL,'eec','impression','search-result'
UNION ALL SELECT 42,107,'search-result','eec','impression','sendData'
UNION ALL SELECT 42,107,'search-result','eec','impression','sendData'
UNION ALL SELECT 42,107,'search-result','eec','impression','sendData'
UNION ALL SELECT 42,107,'search-result','eec','impression','sendData'
UNION ALL SELECT 42,108,NULL,NULL,NULL,NULL
UNION ALL SELECT 42,109,NULL,'eec','impression','banner'
UNION ALL SELECT 42,110,'banner-105-chocolate','eec','impression','sendData'
UNION ALL SELECT 42,110,'banner-105-chocolate','eec','impression','sendData'
UNION ALL SELECT 42,110,'banner-105-chocolate','eec','impression','sendData'
)
SELECT
*
, event_name()
, pattern_id()
, match_id()
FROM ga_sessions_hits_product_expanded
MATCH(
PARTITION BY unique_visit_id ORDER BY hit_number
DEFINE
search AS event_category='Search' AND event_action='Submit'
, banner AS product_list ILIKE 'banner-%' AND event_action='impression'
, other AS true
PATTERN p AS (search+ other* banner+)
ROWS MATCH FIRST EVENT
);
-- out Null display is "NULL".
-- out unique_visit_id | hit_number | product_list | Event_Category | Event_Action | Event_Label | event_name | pattern_id | match_id
-- out -----------------+------------+----------------------+----------------+--------------+---------------+------------+------------+----------
-- out 42 | 105 | NULL | Search | Submit | chocolate | search | 1 | 1
-- out 42 | 106 | NULL | eec | impression | search-result | other | 1 | 2
-- out 42 | 107 | search-result | eec | impression | sendData | other | 1 | 3
-- out 42 | 107 | search-result | eec | impression | sendData | other | 1 | 4
-- out 42 | 107 | search-result | eec | impression | sendData | other | 1 | 5
-- out 42 | 107 | search-result | eec | impression | sendData | other | 1 | 6
-- out 42 | 108 | NULL | NULL | NULL | NULL | other | 1 | 7
-- out 42 | 109 | NULL | eec | impression | banner | other | 1 | 8
-- out 42 | 110 | banner-105-chocolate | eec | impression | sendData | banner | 1 | 9
-- out 42 | 110 | banner-105-chocolate | eec | impression | sendData | banner | 1 | 10
-- out 42 | 110 | banner-105-chocolate | eec | impression | sendData | banner | 1 | 11
-- out (11 rows)
-- out
-- out Time: First fetch (11 rows): 50.632 ms. All rows formatted: 50.721 ms