I have an 'interactions' table that basically has a row for each time an id interacted. So there can be many repeated ids:
id | dateinteracted |
---|---|
1651 | September 15, 2017, 3:07 PM |
1366 | June 28, 2017, 8:32 PM |
1366 | July 3, 2017, 6:26 PM |
1366 | July 3, 2017, 6:22 PM |
1212 | August 24, 2017, 1:05 PM |
1366 | September 18, 2017, 9:15 AM |
I need to pull a rolling 90-day distinct count of ids, but only count ids that appear 3 or more times in that 90 days (high engagement). I'm having trouble converting a 90 day lookback query for 1 day into a rolling one that pulls for each passing day of the year.
(I'm pulling data from Redshift btw) First I wrote a sql query to get a distinct count of IDs from the last 90 days where the id interacted at least 3 times to be counted:
WITH cte AS
(
SELECT
id
,COUNT(id) "Count"
FROM interactions
WHERE 1=1
AND dateinteracted >= (current_date - 90)
AND dateinteracted < current_date
GROUP BY id
HAVING COUNT(id) >= 3
)
SELECT
COUNT(DISTINCT(id)) "DistinctCount"
FROM cte
I then tried to convert that to pull on a rolling basis. i.e. for each and every day I want the distinct count from the prior 90 days where engagement >=3. (a window function with RANGE is not available to me due to pulling this data from Amazon Redshift where it isn't supported):
WITH cte AS (
SELECT
A.dateinteracted,
A.id,
(SELECT
COUNT(B.id)
FROM interactions AS B
WHERE 1=1
AND B.id = A.id
AND B.dateinteracted >= (A.dateinteracted - 90)
AND B.dateinteracted < A.dateinteracted
) AS RollingCount
FROM interactions AS A
)
SELECT
CAST(dateinteracted as Date) "Date",
COUNT( DISTINCT(id)) "DistinctCount"
FROM cte
WHERE 1=1
AND RollingCount >= 3
GROUP BY
CAST(dateinteracted as Date)
ORDER BY 1 DESC
This code seems close, but looks to be undercounting. I'm not positive but I think the issue is that for each day, it is only looking back 90 days for ids that happen to come in on that day, and not all ids from the previous 90 days from that date. Any help would be massive thank you!
I will suppose solution without window function. Idea is self join table and count records with for each record and than for each date.
create table interactions( id int, dateinteracted datetime);
insert into interactions values
(1401,'2017-05-29 09:16:00')
,(1401,'2017-06-28 09:11:00')
,(1501,'2017-06-02 09:16:00')
,(1366,'2017-07-03 18:26:00')
,(1366,'2017-07-03 18:22:00')
,(1366,'2017-07-28 20:32:00')
,(1401,'2017-07-28 09:11:00')
,(1212,'2017-08-24 13:05:00')
,(1651,'2017-09-15 15:07:00')
,(1366,'2017-09-18 09:15:00')
,(1366,'2017-10-02 09:16:00')
,(1401,'2017-10-02 09:16:00')
;
with gr as(
select t1.id,t1.dateinteracted
,count(*) cnt
from interactions t1
left join interactions t2
on t1.id=t2.id
and t2.dateinteracted<t1.dateinteracted
and t2.dateinteracted>=(t1.dateinteracted-90)
group by t1.id,t1.dateinteracted
)
select cast(dateinteracted as date) cdate
,sum(case when cnt>=2 then 1 else 0 end) cnt_in_day
from gr
where cnt>=2
group by cast(dateinteracted as date)
Cte result
id | dateinteracted | cnt |
---|---|---|
1212 | 2017-08-24 13:05:00.000 | 1 |
1366 | 2017-07-03 18:22:00.000 | 1 |
1366 | 2017-07-03 18:26:00.000 | 1 |
1366 | 2017-07-28 20:32:00.000 | 2 |
1366 | 2017-09-18 09:15:00.000 | 3 |
1366 | 2017-10-02 09:16:00.000 | 2 |
1401 | 2017-05-29 09:16:00.000 | 1 |
1401 | 2017-06-28 09:11:00.000 | 1 |
1401 | 2017-07-28 09:11:00.000 | 2 |
1401 | 2017-10-02 09:16:00.000 | 1 |
1501 | 2017-06-02 09:16:00.000 | 1 |
1651 | 2017-09-15 15:07:00.000 | 1 |
Query result
cdate | cnt_in_day |
---|---|
2017-07-28 | 2 |
2017-09-18 | 1 |
2017-10-02 | 1 |
Or full result (without condition where cnt>=2)
cdate | cnt_in_day |
---|---|
2017-05-29 | 0 |
2017-06-02 | 0 |
2017-06-28 | 0 |
2017-07-03 | 0 |
2017-07-28 | 2 |
2017-08-24 | 0 |
2017-09-15 | 0 |
2017-09-18 | 1 |
2017-10-02 | 1 |
Example here Although query test example is for SQL Server, nothing specific is used, i think.
I'll update answer. New example counts activity with lag 90 days for every day in calendar. Calendar (dates) is list all of days from min(dateinteracted) to max(dateinteracted). You may use other calendar, if if necessary.
Next cte (act_in_day) joins all activities to calendar row with interactions on conditions dateinteracted from today to previous 90 days.
with dates as(
select 1 lvl,cast(min(cast(dateinteracted as date)) as datetime) as dt
,cast(max(cast(dateinteracted as date)) as datetime) maxdt from interactions
union all
select lvl+1,dt+1,maxdt from dates where dt+1<=maxdt --and lvl<10
)
,act_in_day as(
select *
from dates d
left join interactions i
on cast(d.dt as date)>=cast(i.dateinteracted as date)
and cast(d.dt as date)<cast((i.dateinteracted+90) as date)
)
,id_act_in_day as(
select id,dt
,count(*) cnt
from act_in_day
group by id,dt
)
,act_by_day as(
select dt
,sum(case when cnt>=3 then 1 else 0 end) cnt_in_day
from id_act_in_day
--where cnt>=2
group by dt
)
select * from act_by_day
order by dt
option (maxrecursion 300);