I have a lot of log data that has some duplicate values except for two columns. The simplified query looks like this:
AzureActivity
| project TimeGenerated, CorrelationId, Caller, Action, RoleId, ObjectId
This gives me a table that looks like this:
TimeGenerated | CorrelationId | Caller | Action | RoleId | ObjectId | Status |
---|---|---|---|---|---|---|
8/13/2024, 4:09:34.099 PM | 1 | John Doe | Delete | 222 | 333 | Success |
8/13/2024, 4:09:35.099 PM | 1 | John Doe | Delete | Start | ||
8/15/2024, 8:09:34.099 PM | 2 | John Doe | Write | 444 | 555 | Success |
8/15/2024, 8:09:35.099 PM | 2 | John Doe | Write | Start | ||
8/19/2024, 1:09:34.099 PM | 3 | John Doe | Write | 666 | 777 | Success |
8/19/2024, 1:09:36.099 PM | 3 | John Doe | Write | Start |
I would like the table to look like this:
TimeGenerated | CorrelationId | Caller | Action | RoleId | ObjectId | Status |
---|---|---|---|---|---|---|
8/13/2024, 4:09:34.099 PM | 1 | John Doe | Delete | 222 | 333 | Success |
8/15/2024, 8:09:34.099 PM | 2 | John Doe | Write | 444 | 555 | Success |
8/19/2024, 1:09:34.099 PM | 3 | John Doe | Write | 666 | 777 | Success |
Anyone has any tips on how to get this to work?
I have tried to summarize by the correlation ID
, but I will either still have duplicate rows or missing rows. I have tried to look for other solutions, but can't seem to find anything that works for this specific case.
Another (similar) solution would be to control which elements take_any()
will choose with prev()
and order by
, like that:
datatable(TimeGenerated:datetime, CorrelationId:long, Caller:string, Action:string, RoleId:long, ObjectId:long, Status:string)
[
datetime(8/13/2024, 4:09:35.099 PM), 1, 'John Doe', 'Delete', 111, long(null), 'Start',
datetime(8/19/2024, 1:09:36.099 PM), 3, 'John Doe', 'Write', long(null), long(null), 'Start',
datetime(8/15/2024, 8:09:35.099 PM), 2, 'John Doe', 'Write', long(null), long(null), 'Start',
datetime(8/13/2024, 4:09:34.099 PM), 1, 'John Doe', 'Delete', 222, 333, 'Success',
datetime(8/15/2024, 8:09:34.099 PM), 2, 'John Doe', 'Write', 444, 555, 'Success',
datetime(8/19/2024, 1:09:34.099 PM), 3, 'John Doe', 'Write', 666, 777, 'Success',
]
| order by CorrelationId, RoleId, ObjectId
| where prev(CorrelationId) != CorrelationId
| summarize take_any(*) by CorrelationId
This Query will only choose non-empty fields (if existing) of RoleId
and ObjectId
. Without that take_any()
would just choose random matching CorrelationIds
.