regexsplunksplunk-formula

How to group strings based on similarities in the string


I have a splunk query that produces a summarises errors by frequency

index="pc_1" LogLevel=ERROR 
   | eval Message=split(_raw,"|") 
   | stats count(LogLevel) as Frequency by Message 
   | sort -Frequency

This produces results in the form

Message Frquency
No such user 137
unable to deliver mail to example@email.com: Unable to reach server 70
unable to deliver mail to example1@email.com: Unable to reach server 43
unable to authenticate user 3456 8
unable to deliver mail to example2@email.com: Unable to reach server 6
unable to authenticate user 2321 5
unable to authenticate user 13321 3
... .
... .
... .
unable to deliver mail to examplen@email.com: Unable to reach server 1

As you can notice in the results produced, some similar errors are being split based on difference in ids of users emails, and machine ids. I am looking for a way I can group this based on similarities in strings. Currently what I am using is the replace the strings with a common regexp and then find the frequency

index="pc_1" LogLevel=ERROR 
   | eval Message=split(_raw,"|")

   | eval Message=replace("unable to deliver mail to (.)* Unable to reach server", "unable to deliver mail to [email]: Unable to reach server")
   | eval Message=replace("unable to authenticate user \d+", "unable to authenticate user [userId]")

   | stats count(LogLevel) as Frequency by Message 
   | sort -Frequency

This approach works but is quite cumbersome as there are a number of different types of errors and if this solution is to be implemented then it require going through each error and developing a regular expression for each.

Is there a way this can be improved with a query that can summarize this error more effectively?


Solution

  • Answer for posterity:

    Perhaps the cluster command will help. It groups like messages together.