regexlogstashmongrel2logstash-grok

Logstash Grok Pattern vs Python Regex?


I am trying to configure logstash to manage my various log sources, one of which is Mongrel2. The format used by Mongrel2 is tnetstring, where a log message will take the form

86:9:localhost,12:192.168.33.1,5:57089#10:1411396297#3:GET,1:/,8:HTTP/1.1,3:200#6:145978#]

I want to write my own grok patterns to extract certain fields from the above format. I started by testing my regex on the above message here, the regex is

^(?:[^:]*\:){2}([^,]*)

this matches localhost. When I use the same regex as a grok pattern in the form

TEST ^(?:[^:]*\:){2}([^,]*)
MONGREL %{TEST:test}

and configure logstash with

filter {
  grok {
    match => [ "message", "%{MONGREL}" ]
  }
}

the same regex results in the match 86:9:localhost. I can't figure out where I am going wrong? Is is that the regex engine I was using to test is based on Python but the grok filter regex is based on Onigurama?

Currently testing it in grokdebug with the following input

86:9:localhost,12:192.168.33.1,5:57089#10:1411396297#3:GET,1:/,8:HTTP/1.1,3:200#6:145978#]

and the following pattern

(?<hostname>^(?:[^:]*\:){2}([^,]*))

resulting in

{
  "hostname": [
    [
      "86:9:localhost"
    ]
  ]
}

where I want

{
  "hostname": [
    [
      "localhost"
    ]
  ]
}

Solution

  • A pattern like this will extract the host name:

    ^(\d+)?:(\d+)?:(?<hostname>[^,]+),
    

    Or writing it in a similar manner that you already wrote it:

    ^(?:[^:]*\:){2}(?<hostname>[^,]*)
    

    The capture name needs to be inside the parenthesis that you want to capture... your pattern was capturing everything up to that point.