azurekqlazure-data-explorerkusto-explorer

Kusto (KQL), How to pivot event rows that are grouped into single lines (by user and day), including event start and end times?


I hope somebody can help me with a tricky Kusto query.

I have the following dataset, which contains events raised by user, where events are represented with event code:

datatable(CreatedDate:datetime, User:string, EventCode:string)
[
    datetime(4-22-2024 12:44:02.750 PM), "user1", "TS",
    datetime(4-23-2024 4:09:30.551 AM), "user1", "TD",
    datetime(4-23-2024 4:09:59.067 AM), "user1", "SP",
    datetime(4-23-2024 7:10:02.052 AM), "user1", "TD",
    datetime(4-23-2024 7:12:05.357 AM), "user1", "TC",
    datetime(4-25-2024 5:11:02.649 AM), "user1", "TD",
    datetime(4-25-2024 5:12:56.672 AM), "user1", "TC",
    datetime(4-23-2024 9:53:12.315 AM), "user2", "TS",
    datetime(4-25-2024 4:36:33.656 AM), "user2", "TD",
    datetime(4-25-2024 4:38:46.922 AM), "user2", "TC",
    datetime(4-22-2024 12:40:35.801 PM), "user3", "TS",
    datetime(4-23-2024 4:13:09.379 AM), "user3", "TD",
    datetime(4-23-2024 4:13:23.724 AM), "user3", "TS",
    datetime(4-23-2024 4:14:23.724 AM), "user3", "TC",
    datetime(4-25-2024 4:34:18.966 AM), "user3", "TD",
    datetime(4-25-2024 4:41:07.381 AM), "user3", "TC",
]
| order by User asc, CreatedDate asc

I need to pivot this data to have a single day per user represented as a single line, which shows all of the complete events with their start and end date on a single line.

Event codes:

There can be intermediary events between TD and (TC or SP) (which don't affect anything), or there could be events sequences like TD, TD, TC, in which case only the last sequence should be taken into account.

And event in the output table should be represented by the event end code (TC or SP)

So the expected output for the data above would be this:

Created Day User Total Elapsed Event 1 Event 1 - From Time Event 1 - To Time Event 1 - Elapsed Event 2 Event 2 - From Time Event 2 - To Time Event 2 - Elapsed
2024-04-23 user1 00:02:31 SP 4:09:30.551 AM 4:09:59.067 AM 00:00:28 TC 7:10:02.052 AM 7:12:05.357 AM 00:02:03
2024-04-25 user1 00:01:54 TC 5:11:02.649 AM 5:12:56.672 AM 00:01:54
2024-04-25 user2 00:02:13 TC 4:36:33.656 AM 4:38:46.922 AM 00:02:13
2024-04-23 user3 00:01:14 TC 4:13:09.379 AM 4:14:23.724 AM 00:01:14
2024-04-25 user3 00:06:48 TC 4:34:18.966 AM 4:41:07.381 AM 00:06:48

There can be many more events raised per day for the user (up to 10-15 events maybe), which would stretch it horizontally. Also please don't mind date formatting, it can be anything, I just used the first available one.


Solution

  • Finally figured it out.

    Here is the query:

    let groupedEvents = datatable(CreatedDate:datetime, User:string, EventCode:string)
    [
        datetime(4-22-2024 12:44:02.750 PM), "user1", "TS",
        datetime(4-23-2024 4:09:30.551 AM), "user1", "TD",
        datetime(4-23-2024 4:09:59.067 AM), "user1", "SP",
        datetime(4-23-2024 7:10:02.052 AM), "user1", "TD",
        datetime(4-23-2024 7:12:05.357 AM), "user1", "TC",
        datetime(4-25-2024 5:11:02.649 AM), "user1", "TD",
        datetime(4-25-2024 5:12:56.672 AM), "user1", "TC",
        datetime(4-23-2024 9:53:12.315 AM), "user2", "TS",
        datetime(4-25-2024 4:36:33.656 AM), "user2", "TD",
        datetime(4-25-2024 4:38:46.922 AM), "user2", "TC",
        datetime(4-22-2024 12:40:35.801 PM), "user3", "TS",
        datetime(4-23-2024 4:13:09.379 AM), "user3", "TD",
        datetime(4-23-2024 4:13:23.724 AM), "user3", "TS",
        datetime(4-23-2024 4:14:23.724 AM), "user3", "TC",
        datetime(4-25-2024 4:34:18.966 AM), "user3", "TD",
        datetime(4-25-2024 4:41:07.381 AM), "user3", "TC",
    ]
    | order by User asc, CreatedDate asc
    | where EventCode in ('TD', 'TC', 'SP') 
    | extend EventStatus = iff(EventCode == 'TD', 'start', 'end')
    | sort by CreatedDate asc, User desc
    | extend NextUser = next(User), NextCreatedDate = next(CreatedDate), NextEventStatus = next(EventStatus), NextEventCode = next(EventCode)
    | extend CreatedDateOnly = bin(CreatedDate, 1d), NextCreatedDateOnly = bin(NextCreatedDate, 1d)
    | where User == NextUser and EventStatus == 'start' and NextEventStatus == 'end' and CreatedDateOnly == NextCreatedDateOnly
    | project CreatedDate = format_datetime(CreatedDateOnly, "yyyy-MM-dd"), User, Event = NextEventCode, From = format_datetime(CreatedDate, "HH:mm:ss"), To = format_datetime(NextCreatedDate, "HH:mm:ss"), Elapsed = bin(NextCreatedDate - CreatedDate, 1s );
    groupedEvents
    | summarize TotalElapsed = sum(Elapsed), EventsCount = count() by CreatedDate, User
    | join kind=inner  (
        groupedEvents
        | sort by CreatedDate asc, User asc
        | extend PrevCreatedDate = prev(CreatedDate), PrevUser = prev(User)
        | project CreatedDate, User, ColumnValue = strcat_delim('`',Event,From,To,Elapsed), EventNo = row_number(1, CreatedDate != PrevCreatedDate or User != PrevUser)
        | mv-apply with_itemindex = i split(ColumnValue, '`') on (
            extend i = i + 1 
        )
        | extend ColumnName = strcat("Event ", EventNo, case(i == 1, "", i == 2, " - From", i == 3, " - To", i == 4, " - Elapsed", "")) 
        | project-away EventNo, i
        | evaluate pivot (ColumnName, take_any(ColumnValue))
    ) on CreatedDate, User
    | project-away CreatedDate1, User1
    | sort by User asc, CreatedDate asc
    

    Here is the output:

    CreatedDate User TotalElapsed EventsCount Event 1 Event 1 - Elapsed Event 1 - From Event 1 - To Event 2 Event 2 - Elapsed Event 2 - From Event 2 - To
    2024-04-23 user1 00:02:31 2 SP 00:00:28 04:09:30 04:09:59 TC 00:02:03 07:10:02 07:12:05
    2024-04-25 user1 00:01:54 1 TC 00:01:54 05:11:02 05:12:56
    2024-04-25 user2 00:02:13 1 TC 00:02:13 04:36:33 04:38:46
    2024-04-23 user3 00:01:14 1 TC 00:01:14 04:13:09 04:14:23

    Hope it helps someone in the future as well