azureazure-eventhubazure-stream-analytics

How to create time window based on more than just TIMESTAMP in FROM clause in Stream Analytics?


I have users who can take two actions:

  1. Enter the room
  2. Leave the room

I'm using Azure Stream Analytics and my goal is to create a table of users who entered the room but haven't left it in the next 30 minutes. The data have the following structure:

{
user_id: 'user_id',
event_name: 'Enter the room' | 'Leave the room',
event_timestamp: 'yyyy-mm-dd hh:mi:ss'
}

Example data (event hub stream):

User_id Event_name Event_timestamp
A Enter the room 2023-09-02 12:00:00
B Enter the room 2023-09-02 12:10:00
C Enter the room 2023-09-02 12:20:00
A Leave the room 2023-09-02 12:25:00
B Leave the room 2023-09-02 12:45:00

Desired output (output table):

User_id Event_name Event_timestamp
B Enter the room 2023-09-02 12:10:00
C Enter the room 2023-09-02 12:20:00

I've tried to use functionality of the window functions but they don't group by any parameter other than timestamp defined in the FROM clause. It results in creating time windows without taking in considerations what event ocurred nor who created the event.


Solution

  • You use below queries.

    WITH enter_room AS  (
    SELECT
        user_id,
        try_cast(event_timestamp as datetime)  as entry_time,
        event_name
    FROM
        inputdata
    WHERE
        event_name =  'Enter the room'
    ),
    
    leave_room AS  (
    SELECT
        user_id,
        try_cast(event_timestamp as datetime)  AS exit_time,
        event_name
    FROM
        inputdata
    WHERE
    event_name =  'Leave the room'
    )
    
    
    SELECT
        e.user_id,
        e.event_name,
        e.entry_time as event_timestamp
    into outdata
    FROM enter_room e
    left  JOIN leave_room l
    ON e.user_id = l.user_id
    AND
    DATEDIFF(minute, e, l)  BETWEEN 0 AND 30
    WHERE DATEDIFF(minute,e.entry_time,l.exit_time)  > 30 or  DATEDIFF(minute,e.entry_time,l.exit_time)  is  NULL;
    

    Here, i took entry_time and exit_time in separate records and joined them with difference of entry and exit time more than 30 minutes or null as condition.

    Output:

    enter image description here