I have a Splunk query like this:
index=my_app environment=test source="/users/sahild/app.log" "fname" OR "lname" OR "dob" OR "address" | <>
Now, from the initial results of the main query (before pipeline), I need to filter out the results/events so that for a unique combination of "cls" and "mthd" duplicate results are removed. For example, if my initial results are:
2024-08-28 23:07:30,285 INFO {"msg":"Response: {\"Output\":[{\"Address\":\"Bay Rd\",\"PostalCode\":\"12345\","cls":"myfirstclass","mthd":"firstMethod"}
2024-08-28 23:07:30,285 INFO {"msg":"Response: {\"Output\":[{\"Address\":\"Lincoln Rd\",\"PostalCode\":\"45678\","cls":"myfirstclass","mthd":"firstMethod"}
2024-08-28 23:07:30,285 INFO {"msg":"Response: {\"Output\":[{\"fName\":\"John\",\"PostalCode\":\"12345\","cls":"mySecondClass","mthd":"secondMethod"}
2024-08-28 23:07:30,285 INFO {"msg":"Response: {\"Output\":[{\"fName\":\"Emma\",\"PostalCode\":\"45678\","cls":"mySecondClass","mthd":"secondMethod"}
I want to filter out the results/events such that I get records that has a unique combined value of "cls" and "mthd", appearing only once. so final result should look something like:
2024-08-28 23:07:30,285 INFO {"msg":"Response: {\"Output\":[{\"Address\":\"Lincoln Rd\",\"PostalCode\":\"45678\","cls":"myfirstclass","mthd":"firstMethod"}
2024-08-28 23:07:30,285 INFO {"msg":"Response: {\"Output\":[{\"fName\":\"John\",\"PostalCode\":\"12345\","cls":"mySecondClass","mthd":"secondMethod"}
Because for the initial search I am getting hundreds of thousands of results but I don't want the repeating data for same cls and mthd. I hope my ask is clear.
I could not try much for the query after the pipeline since I don't have much knowledge of splunk regex or functions that can be used to achieve this. Need some expert help.
I found a solution. I need to use rex commands to pull out "cls" and "mthd" as Splunk fields so that I can use Splunk operations on those. Something like this:
index=myIndex environment=test source="/path/to/logsfile/demo.log" ("firstName" OR "lastName") AND "mthd" AND "cls"
| rex field=_raw max_match=0 "(?<ClassAndMethodName>\"cls\"\:\"\w*\",\"mthd\"\:\"\w*\")"
| dedup ClassAndMethodName
This query gives me only those events which have unique combined value of cls and mthd.