I'm working to create a database view that associates an online order with a users preceding website visits. This is for an eCommerce website so one user may visit and order several times.
I've already joined the visits table and orders table on user_id and associated the nearest less-than session time with an order time. Now, I'm hoping to say every visit up until order #1 is "1", and then after visit thereafter up until order #2 is "2". Also, if there is no order_id for that specific user I'd like to return "0". See screenshots linked below for reference.
I've already tried to use dense_rank, but it is only ranking the rows in which an order_id is present. I want to carry forward these ranks.
SELECT v.id AS visit_id,
v.user_id,
v.started_at AS visit_date,
dense_rank() OVER (PARTITION BY v.user_id ORDER BY v.started_at) AS visit_number,
dense_rank() OVER (PARTITION BY v.user_id ORDER BY o.id) AS order_number,
o.id AS order_id,
o.created_at AS order_date
FROM visits v
FULL JOIN orders o ON v.user_id = o.user_id AND v.started_at < o.created_at AND o.created_at < (( SELECT min(visits.started_at) AS min
FROM visits
WHERE visits.user_id = v.user_id AND visits.started_at > v.started_at)) AND (v.started_at + '24:00:00'::interval) > o.created_at
GROUP BY v.id, v.user_id, v.started_at, o.id, o.created_at
ORDER BY v.started_at;
The GROUP BY
seems unnecessary, but I'll leave it it. You basically need a cumulative sum.
I would assign all visits before a particular order the order number:
SELECT v.id AS visit_id, v.user_id,
v.started_at AS visit_date,
dense_rank() OVER (PARTITION BY v.user_id ORDER BY v.started_at) AS visit_number,
dense_rank() OVER (PARTITION BY v.user_id ORDER BY o.id) AS order_number,
o.id AS order_id,
o.created_at AS order_date,
count(o.id) over (partition by v.user_id order by v.started_at) as order_number
FROM visits v FULL JOIN
orders o
ON v.user_id = o.user_id AND
v.started_at < o.created_at AND
o.created_at < (SELECT min(visits.started_at)
FROM visits v2
WHERE v2.user_id = v.user_id AND
v2.started_at > v.started_at) AND
(v.started_at + '24:00:00'::interval) > o.created_at
GROUP BY v.id, v.user_id, v.started_at, o.id, o.created_at
ORDER BY v.started_at;
I think this is the logic that you want:
SELECT v.id AS visit_id, v.user_id,
v.started_at AS visit_date,
dense_rank() OVER (PARTITION BY v.user_id ORDER BY v.started_at) AS visit_number,
dense_rank() OVER (PARTITION BY v.user_id ORDER BY o.id) AS order_number,
o.id AS order_id,
o.created_at AS order_date,
MIN(o.order_number) OVER (PARTITION BY v.user_id ORDER BY v.started_at DESC) as order_number
FROM visits v FULL JOIN
(SELECT o.*,
ROW_NUMBER() OVER (PARTITION BY o.user_id ORDER BY o.id) as order_number
FROM orders o
) o
ON v.user_id = o.user_id AND
v.started_at < o.created_at AND
o.created_at < (SELECT min(visits.started_at)
FROM visits v2
WHERE v2.user_id = v.user_id AND
v2.started_at > v.started_at) AND
(v.started_at + '24:00:00'::interval) > o.created_at
GROUP BY v.id, v.user_id, v.started_at, o.id, o.created_at
ORDER BY v.started_at;
It might produces NULL
s where you want 0
s, however.