regexlogstashlogstash-grokfilebeatoniguruma

Grok/Oniguruma pattern to match first IP from X-Forwarded-For header


For this issue I'm trying to create a grok pattern, which matches the first IP from the X-Forwarded-For header in a nginx log. A log line typically looks like this:

68.75.44.178, 172.68.146.54, 127.0.0.1 - - [15/May/2017:12:16:27 +0200] "GET /jobs/24237/it-back-end HTTP/1.1" 301 5 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

The first IP is the the clients actual IP, which is the one I want to retreive, the other two come from proxies, in our case cloudflare and varnish.

My pattern, which I tried on https://grokconstructor.appspot.com looks like this:

FIRSTIPORHOST (^%{IPORHOST})(?:,\s%{IPORHOST})*

Unfortunally it matches all IPs, despite the non capturing group, so what am I doing wrong? Or is there a better pattern?

Clarification:

One to read the whole log file into elastic search using filebeats, I therefore need to somehow match IPs, otherwise I won't be able to match the rest of the line, like the date or user agent and so on.


Solution

  • You need to add the (?:,\s[\d.]+)* after the %{IPORHOST:nginx.access.remote_ip} at the start of the pattern. See the fixed expression:

    "%{IPORHOST:nginx.access.remote_ip}(?:,\\s[\\d.]+)* - %{DATA:nginx.access.user_name} \\[%{HTTPDATE:nginx.access.time}\\] \"%{WORD:nginx.access.method} %{DATA:nginx.access.url} HTTP/%{NUMBER:nginx.access.http_version}\" %{NUMBER:nginx.access.response_code} %{NUMBER:nginx.access.body_sent.bytes} \"%{DATA:nginx.access.referrer}\" \"%{DATA:nginx.access.agent}\""
    

    The (?:,\s[\d.]+)* non-capturing repeated group matches 0+ occurrences of:

    This way, no additional data can be captured.