sqlmysqlmysql-8.0

How to Efficiently Find Consecutive Rows with Matching Criteria in SQL?


I have a dataset that tracks user activity, and I need to identify consecutive rows where a specific condition is met. The table structure and sample data are as follows:

Table: UserActivity

UserID Activity Timestamp
1 Login 2023-11-01 08:00:00
1 PageView 2023-11-01 08:10:00
1 Login 2023-11-01 08:15:00
1 Logout 2023-11-01 08:20:00
2 Login 2023-11-01 09:00:00
2 Login 2023-11-01 09:05:00
2 PageView 2023-11-01 09:10:00

I want to identify cases where consecutive rows for the same user have the same Activity, along with their timestamps. For example:

For UserID = 2, there are two consecutive Login activities. Like this.

Expected Output

UserID Activity StartTimestamp EndTimestamp
2 Login 2023-11-01 09:00:00 2023-11-01 09:05:00

I have attempted using a self-join to compare each row with the previous one, but it becomes computationally expensive for large datasets. I’ve also tried using LAG and LEAD functions, but I’m struggling to group consecutive rows efficiently.


Solution

  • Here is a possible solution using LAG. This solution assumes no more then 2 duplicates per UserID+Activity.

    WITH CTE AS
    (
    SELECT UserID, Activity, 
            LAG(USERID) OVER (ORDER BY TimeStamp)  as PrevID, 
            LAG(Activity) OVER (ORDER BY TimeStamp)  as PrevActivity, 
            LAG(TimeStamp) OVER (ORDER BY TimeStamp)  as StartTimeStamp, 
            TimeStamp as EndTimeStamp
    FROM Example
    ORDER BY TimeStamp
    )
    SELECT UserID, Activity, StartTimeStamp, EndTimeStamp
    FROM CTE
    WHERE UserID=PrevID AND Activity=PrevActivity
    ORDER BY StartTimeStamp
    

    fiddle

    UserID Activity StartTimeStamp EndTimeStamp
    2 Login 2023-11-01 09:00:00 2023-11-01 09:05:00