I have an nginx access_log Input that receives logs in json format. I have been trying to get the JSON Extractors working but to no avail.
Firstly, I was following this official Graylog tutorial: https://www.graylog.org/videos/json-extractor
This is a sample full message that comes in:
MyHost nginx: { “timestamp”: “1658474614.043”, “remote_addr”: “x.x.x.x.x”, “body_bytes_sent”: 229221, “request_time”: 0.005, “response_status”: 200, “request”: “GET /foo/bar/1999/09/sth.jpeg HTTP/2.0”, “request_method”: “GET”, “host”: “www…somesite.com”,“upstream_cache_status”: “”,“upstream_addr”: “x.x.x.x.x:xxx”,“http_x_forwarded_for”: “”,“http_referrer”: “https:////www.somesite.com/foo/bar/woo/boo/moo”, “http_user_agent”: “Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36”, “http_version”: “HTTP/2.0”, “nginx_access”: true }
It's then extracted into a json field by the use of a following regex: nginx:\s+(.*)
Then the json field looks like that:
{ “timestamp”: “1658474614.043”, “remote_addr”: “x.x.x.x.x”, “body_bytes_sent”: 229221, “request_time”: 0.005, “response_status”: 200, “request”: “GET /foo/bar/1999/09/sth.jpeg HTTP/2.0”, “request_method”: “GET”, “host”: “www…somesite.com”,“upstream_cache_status”: “”,“upstream_addr”: “x.x.x.x.x:xxx”,“http_x_forwarded_for”: “”,“http_referrer”: “https://www.somesite.com/foo/bar/woo/boo/moo”, “http_user_agent”: “Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36”, “http_version”: “HTTP/2.0”, “nginx_access”: true }
However from now on things only go downhill. I have set up a basic default JSON extractor without changing any options and when I click "Try" it shows the correct output:
Sadly after I implement this extractor, messages stop showing up in my Input. There has to be some kind of error but I couldn't find anything in the server.log
located in /var/log/graylog-server/server.log
.
Hope someone will help me figure this out!
Since the link to the solution has been removed by a moderator, here's a pipeline that ultimately got the job done:
rule "parse the json log entries"
when has_field("json")
then
let json_tree = parse_json(to_string($message.json));
let json_fields = select_jsonpath(json_tree, { time: "$.timestamp",
remote_addr: "$.remote_addr", body_bytes_sent: "$.body_bytes_sent",
request_time: "$.request_time", response_status: "$.response_status",
request: "$.request", request_method: "$.request_method", host:
"$.host", upstream_cache_status: "$.upstream_cache_status",
upstream_addr: "$.upstream_addr" , http_x_forwarded_for:
"$.http_x_forwarded_for" , http_referrer: "$.http_referrer",
http_user_agent: "$.http_user_agent", http_version: "$.http_version",
nginx_access: "$.nginx_access"});
# Adding additional hours due to timezone differences, adjust it to your needs
let s_epoch = to_string(json_fields.time);
let s = substring(s_epoch, 0, 10);
let ts_millis = (to_long(s) + 7200) * 1000;
let new_date = parse_unix_milliseconds(ts_millis);
set_field("date", new_date);
set_field("remote_addr", to_string(json_fields.remote_addr));
set_field("body_bytes_sent",
to_double(json_fields.body_bytes_sent));
set_field("request_time", to_double(json_fields.request_time));
set_field("response_status",
to_double(json_fields.response_status));
set_field("request", to_string(json_fields.request));
set_field("request_method", to_string(json_fields.request_method));
set_field("host", to_string(json_fields.host));
set_field("upstream_cache_status",
to_string(json_fields.upstream_cache_status));
set_field("upstream_addr", to_string(json_fields.upstream_addr));
set_field("http_x_forwarded_for",
to_string(json_fields.http_x_forwarded_for));
set_field("http_referrer", to_string(json_fields.http_referrer));
set_field("http_user_agent",
to_string(json_fields.http_user_agent));
set_field("http_version", to_string(json_fields.http_version));
set_field("nginx_access", to_bool(json_fields.nginx_access));
end
Note that you still have to configure an extractor, in this particular example, the original message looks a bit like this: nginx: {json}. So to make it only json, configure an extractor the following way:
So that's all, you may need to adjust it a bit if it doesn't work, but for most use cases it should.
Still, if anyone would be interested in seeing the entire discussion that resulted in this solution, go to this link: https://community.graylog.org/t/failed-to-index-1-messages-failed-to-parse-field-datetime-of-type-date-in-document/24960/6