regexgrokgraylog2

Grok pattern to match URIPATH with optional URIPARAM


I want to use Grok Pattern for filtering out this

172.20.20.88 - - [10/Nov/2018:23:49:31 +0700] "GET /id/profile.pl?user=285&device=Bg3tlX HTTP/1.1" 502 852 "-" "Go-http-client/2.0" "0.009"

I am using COMMONAPACHELOG

%{IPORHOST:clientip} %{HTTPDUSER:ident} %{USER:auth} \[%{HTTPDATE:timestamp}\] "(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})" %{NUMBER:response} (?:%{NUMBER:bytes}|-)

I have tried %{URIPATH:request} and %{URIPARAM:request}. The result of request is still /id/profile.pl?user=285&device=Bg3tlX. My expectation is /id/profile.pl.

My reference is https://github.com/hpcugent/logstash-patterns/blob/master/files/grok-patterns


Solution

  • Your %{NOTSPACE:request} matches any 1 or more non-whitespace chars before HTTP/1.1" 502 85... as NOTSPACE pattern is \S+. So, it matches the whole /id/profile.pl?user=285&device=Bg3tlX substring.

    You cannot use just URIPATH or URIPARAM, because you still need to match the rest of the input. You have to use both, but make URIPARAM optional after URIPATH by enclosing it within an optional non-capturing group, (?:...)?.

    So, replace %{NOTSPACE:request} with

    %{URIPATH:request}(?:%{URIPARAM:requestparam})?
                      ^^^                        ^^
    

    Demo at https://grokdebug.herokuapp.com/:

    enter image description here