azureloggingkql

KQL merge rows with the same ID into one row


I have a lot of log data that has some duplicate values except for two columns. The simplified query looks like this:

AzureActivity
| project TimeGenerated, CorrelationId, Caller, Action, RoleId, ObjectId

This gives me a table that looks like this:

TimeGenerated CorrelationId Caller Action RoleId ObjectId Status
8/13/2024, 4:09:34.099 PM 1 John Doe Delete 222 333 Success
8/13/2024, 4:09:35.099 PM 1 John Doe Delete Start
8/15/2024, 8:09:34.099 PM 2 John Doe Write 444 555 Success
8/15/2024, 8:09:35.099 PM 2 John Doe Write Start
8/19/2024, 1:09:34.099 PM 3 John Doe Write 666 777 Success
8/19/2024, 1:09:36.099 PM 3 John Doe Write Start

I would like the table to look like this:

TimeGenerated CorrelationId Caller Action RoleId ObjectId Status
8/13/2024, 4:09:34.099 PM 1 John Doe Delete 222 333 Success
8/15/2024, 8:09:34.099 PM 2 John Doe Write 444 555 Success
8/19/2024, 1:09:34.099 PM 3 John Doe Write 666 777 Success

Anyone has any tips on how to get this to work?

I have tried to summarize by the correlation ID, but I will either still have duplicate rows or missing rows. I have tried to look for other solutions, but can't seem to find anything that works for this specific case.


Solution

  • Another (similar) solution would be to control which elements take_any() will choose with prev() and order by, like that:

    datatable(TimeGenerated:datetime, CorrelationId:long, Caller:string, Action:string, RoleId:long, ObjectId:long, Status:string)
    [
        datetime(8/13/2024, 4:09:35.099 PM), 1, 'John Doe', 'Delete', 111, long(null), 'Start',
        datetime(8/19/2024, 1:09:36.099 PM), 3, 'John Doe', 'Write', long(null), long(null), 'Start',
        datetime(8/15/2024, 8:09:35.099 PM), 2, 'John Doe', 'Write', long(null), long(null), 'Start',
        datetime(8/13/2024, 4:09:34.099 PM), 1, 'John Doe', 'Delete', 222,  333, 'Success',
        datetime(8/15/2024, 8:09:34.099 PM), 2, 'John Doe', 'Write',  444,  555, 'Success',
        datetime(8/19/2024, 1:09:34.099 PM), 3, 'John Doe', 'Write',  666,  777, 'Success',
    ]
    | order by CorrelationId, RoleId, ObjectId
    | where prev(CorrelationId) != CorrelationId
    | summarize take_any(*) by CorrelationId
    

    This Query will only choose non-empty fields (if existing) of RoleId and ObjectId. Without that take_any() would just choose random matching CorrelationIds.

    enter image description here