logstashgrok

Regex group from within custom grok pattern


I'm trying to create custom grok patterns to extract various data using logstash and am wracking my brain getting the syntax correct to pull the regex group 1 equivalent from my log rows. I've looked at a ton of threads on this over the past 2 days, but nothing's out there that fits my example, and none of the canned grok patterns seem like they will pull the value I need.

3 example log file rows look similar to this (with abbreviated data for the examples):

2022-04-07 12:52:06,184:INFO   :Thread-70_SCHEDULE.0001: MsgID=63759111848731967
2022-04-07 07:23:39,876:INFO   :Thread-53_OrderInterfaceIntServer: MsgID=21316889724753182|
07:23:40,482 INFO  [stdout] (http-/0.0.0.0:8080-20) 2022-04-07 07:23:40,482:ERROR

I want to create a custom grok pattern called SERVICE that extracts a pattern match using a regex match string:

Thread-[0-9]{2}_(.*?)\:

that for the 3 rows would return:

In the log:

In grok, I can define this in 2 ways:

SERVICE Thread-[0-9]{2}_(.*?)\:
or as a field using (?<service>Thread-[0-9]{2}_(.*?)\:)

however, for row 1, I get the response value of:

{
  "service": [
    [
      "Thread-70_SCHEDULE.0001:"
    ]
  ]
}

What I want is:

{
  "service": [
    [
      "SCHEDULE.0001"
    ]
  ]
}

Which is the equivalent of the regex group 1 response. I can't figure out how to manage the grok patterns to get the result I need.


Solution

  • You do not have to include all of the pattern in the capture group. You can use

    grok { match => { "message" => "Thread-[0-9]{2}_(?<service>.*?):" } }
    

    That will result in

       "service" => "SCHEDULE.0001",
    
       "service" => "OrderInterfaceIntServer",
    

    and a "_grokparsefailure" tag on the third event.