kqlazure-cloud-servicesazure-monitor

How to remove duplicate log entries in dynamic column through KQL Query


I am trying to remove duplicate log entries from the dynamic column through KQL query. Below is the query.

    let ContainerIdList = KubePodInventory
    | where ContainerName contains "boss-logger"
    | where Namespace has "devns"
    | where ClusterId =~ '/subscriptions/abc1234-bcbd-3124-bc3a-524132fb1786645/resourcegroups/rg-aks-dev/providers/Microsoft.ContainerService/managedClusters/aksdevcluster'
    | distinct ContainerID;
    ContainerLog 
    | where ContainerID in (ContainerIdList)
    | where LogEntry matches regex "ERROR" 
    | project LogEntry
    | distinct LogEntry

However it does not removing the duplicate entries due to few characters although the error is same. please find screenshot output below.

enter image description here

if you see the above screenshot, "ERROR" is same but it does not getting removed due to those three characters and different timestamp.

How can we remove those duplicates based on content from ERROR: not timestamp and those three characters.


Solution

  • How to remove duplicate log entries in dynamic column through KQL Query

    I have reproduced in my environment and got expected results as below:

    Firstly, I have taken a simple table like below:

    LogEntry
    
    2023-07-23 11:40:50,076 Error[xxx]
    2023-07-23 11:42:50,400 Error[yyy]
    2023-07-23 11:40:50,567 Error[xxx]
    2023-07-23 14:40:50,011 Error[zzz]
    

    enter image description here

    I have used below query:

    let x = datatable(LogEntry: string)
    [
        "2023-07-23 11:40:50,076 Error[xxx]",
        "2023-07-23 11:42:50,400 Error[yyy]",
        "2023-07-23 11:40:50,567 Error[xxx]",
        "2023-07-23 14:40:50,011 Error[zzz]"
    ];
    x
    | extend SplitLog = split(LogEntry, " ")
    | project C1 = todatetime(SplitLog[0]), C2 =SplitLog[1],cc=SplitLog[2], C3 = tostring(SplitLog[2])
    | summarize arg_max(C1,*) by C3
    | project-away C3
    | project C2,cc,C1= todatetime(C1)
    |extend SplitLog = split(C1, "T")
    | project F1 = SplitLog[0],C2=C2,C3=cc
    | project LogEntry = strcat(F1, " ", C2, " ", C3, " ")
    

    enter image description here

    Output:

    enter image description here

    Fiddle