t-sql

How do I group my data based on time differences i.e. so I can seperate one group of 'events' from another


I have a table of user information, relating to users attempting to perform a certain activity and I'm trying to track & order events but I've hit a wall

Typically the there are 1 to 3+ failed events ending with a successful attempt, all within a short interval (~ 1-2 mins between each event). Then a break of sometimes days, then some more of the rapid events, rinse repeat. The successful event is recorded with a 'success' key. Sometimes they nail it first try.

The service is 24 hour so these events can occur anytime.

I can visually see the different 'event groups' from the timestamps but What I need to do is to be able to identify the event group 'boundaries' and build an 'event sequence', like below. Having the sequence numbering end on the successful event is ideal because it tells me exactly how many attempts it took the user before success.

timestamp success_key Event Sequence
2024-01-01 23:59:00 null 1
2024-01-02 00:01:00 null 2
2024-01-02 00:01:30 null 3
2024-01-02 00:04:00 4568279 4
2024-01-02 14:55:00 4569031 1
2024-01-02 19:03:00 null 1
2024-01-02 19:05:00 null 2
2024-01-02 19:06:00 4580842 3

Above is the 'ideal' table for me showing the attempts counting and the count terminating when the event was 'successful'.

I tried to use ROWNUMBER but on its own it just numbered everything sequentially. I tried LAG on the date, partitioning it by user but I just ended up with a load of date differences which I still couldn't programatically separate from each other.

Any advice for strategy would help enormously.

Thanks Ben


Solution

  • This is a classic Gaps-and-Islands problem.

    One simple solution is to mark the groups using a conditional running count, then use ROW_NUMBER with that as the partitioning clause.

    WITH grouped AS (
        SELECT *,
          ISNULL(COUNT(success_key) OVER (ORDER BY timestamp
              ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING), 0) AS GroupId
        FROM YourTable t
    )
    SELECT *,
      ROW_NUMBER() OVER (PARTITION BY GroupId ORDER BY timestamp) AS Sequence
    FROM Grouped;