For this issue I'm trying to create a grok pattern, which matches the first IP from the X-Forwarded-For header in a nginx log. A log line typically looks like this:
68.75.44.178, 172.68.146.54, 127.0.0.1 - - [15/May/2017:12:16:27 +0200] "GET /jobs/24237/it-back-end HTTP/1.1" 301 5 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
The first IP is the the clients actual IP, which is the one I want to retreive, the other two come from proxies, in our case cloudflare and varnish.
My pattern, which I tried on https://grokconstructor.appspot.com looks like this:
FIRSTIPORHOST (^%{IPORHOST})(?:,\s%{IPORHOST})*
Unfortunally it matches all IPs, despite the non capturing group, so what am I doing wrong? Or is there a better pattern?
Clarification:
One to read the whole log file into elastic search using filebeats, I therefore need to somehow match IPs, otherwise I won't be able to match the rest of the line, like the date or user agent and so on.
You need to add the (?:,\s[\d.]+)*
after the %{IPORHOST:nginx.access.remote_ip}
at the start of the pattern. See the fixed expression:
"%{IPORHOST:nginx.access.remote_ip}(?:,\\s[\\d.]+)* - %{DATA:nginx.access.user_name} \\[%{HTTPDATE:nginx.access.time}\\] \"%{WORD:nginx.access.method} %{DATA:nginx.access.url} HTTP/%{NUMBER:nginx.access.http_version}\" %{NUMBER:nginx.access.response_code} %{NUMBER:nginx.access.body_sent.bytes} \"%{DATA:nginx.access.referrer}\" \"%{DATA:nginx.access.agent}\""
The (?:,\s[\d.]+)*
non-capturing repeated group matches 0+ occurrences of:
,
- a comma\s
- a whitespace[\d.]+
- 1+ digits or commas.This way, no additional data can be captured.