[SOLVED] Splunk - Grouping by distinct field with stats of another field

Splunk - Grouping by distinct field with stats of another field

I have the following Splunk search that gathers the different Statuses from my dataset:

some type of search | eval Status = (REJECT_REASON) | bucket _time span=day | stats count by Status

This is a sample how my dataset looks:

CorrelationId      Reject_Reason    DATE_TIME
12345679           Accepted         20231030 14:00:00 
12345679           Accepted         20231030 14:00:00
12345679           Accepted         20231030 14:00:00
12345679           Sent             20231030 00:00:00
12345679           Sent             20231030 00:00:00
12345679           Sent             20231030 00:00:00
99399394           Rejected         20231030 00:00:00
99399394           Rejected         20231030 00:00:00
88393933           Accepted         20231030 14:00:00
88393933           Sent             20231030 00:00:00
33454545           Rejected         20231030 00:00:00

I would like to only get statuses for the distinct correlationId's, meaning that with the sample dataset I would only get back a count for 4 correlationId's and the statuses that are the latest date.

Example of desired result:

Status    Count
Accepted   2
Rejected   2

I have tried using "dedup correlationId" but it returned no results when I added that on to the search.

Solution

Based on the latest (as of Nov 1st) requirements below is my query:

|makeresults count=11 | streamstats count
| eval CorrelationID=case(count >=1 and count<=6, 12345679, count in (7,8), 99399394, count in (9,10), 88393933, count=11, 33454545),
       Reject_Reason=case(count in (1,2,3) OR count=9, "Accepted", count in (4,5,6) or count=10, "Sent", count in (7,8)  or count=11, "Rejected"),
       DATE_TIME=case(count in (1,2,3) or count=9, "20231030 14:00:00", true(), "20231030 00:00:00" )
| fields - _time, count      
``` The above is test data setup ```
| eval Status=Reject_Reason
| eventstats max(DATE_TIME) as mx by CorrelationID
| where DATE_TIME=mx
| dedup CorrelationID, Status
| stats dc(CorrelationID) as "Count" by Status