jsongraylog2graylog

JSON Extractor stops messages from showing up in graylog input


I have an nginx access_log Input that receives logs in json format. I have been trying to get the JSON Extractors working but to no avail.

Firstly, I was following this official Graylog tutorial: https://www.graylog.org/videos/json-extractor

This is a sample full message that comes in:

MyHost nginx: { “timestamp”: “1658474614.043”, “remote_addr”: “x.x.x.x.x”, “body_bytes_sent”: 229221, “request_time”: 0.005, “response_status”: 200, “request”: “GET /foo/bar/1999/09/sth.jpeg HTTP/2.0”, “request_method”: “GET”, “host”: “www…somesite.com”,“upstream_cache_status”: “”,“upstream_addr”: “x.x.x.x.x:xxx”,“http_x_forwarded_for”: “”,“http_referrer”: “https:////www.somesite.com/foo/bar/woo/boo/moo”, “http_user_agent”: “Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36”, “http_version”: “HTTP/2.0”, “nginx_access”: true }

It's then extracted into a json field by the use of a following regex: nginx:\s+(.*)

Then the json field looks like that:

{ “timestamp”: “1658474614.043”, “remote_addr”: “x.x.x.x.x”, “body_bytes_sent”: 229221, “request_time”: 0.005, “response_status”: 200, “request”: “GET /foo/bar/1999/09/sth.jpeg HTTP/2.0”, “request_method”: “GET”, “host”: “www…somesite.com”,“upstream_cache_status”: “”,“upstream_addr”: “x.x.x.x.x:xxx”,“http_x_forwarded_for”: “”,“http_referrer”: “https://www.somesite.com/foo/bar/woo/boo/moo”, “http_user_agent”: “Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36”, “http_version”: “HTTP/2.0”, “nginx_access”: true }

However from now on things only go downhill. I have set up a basic default JSON extractor without changing any options and when I click "Try" it shows the correct output:

enter image description here

Sadly after I implement this extractor, messages stop showing up in my Input. There has to be some kind of error but I couldn't find anything in the server.log located in /var/log/graylog-server/server.log.

Hope someone will help me figure this out!


Solution

  • Since the link to the solution has been removed by a moderator, here's a pipeline that ultimately got the job done:

    rule "parse the json log entries"
    when has_field("json")
    then
    
     let json_tree = parse_json(to_string($message.json));
    
     let json_fields = select_jsonpath(json_tree, { time: "$.timestamp", 
     remote_addr: "$.remote_addr", body_bytes_sent: "$.body_bytes_sent", 
     request_time: "$.request_time", response_status: "$.response_status", 
     request: "$.request", request_method: "$.request_method", host: 
     "$.host", upstream_cache_status: "$.upstream_cache_status", 
     upstream_addr: "$.upstream_addr" , http_x_forwarded_for: 
     "$.http_x_forwarded_for" , http_referrer: "$.http_referrer", 
     http_user_agent: "$.http_user_agent", http_version: "$.http_version", 
     nginx_access: "$.nginx_access"});
    
     # Adding additional hours due to timezone differences, adjust it to your needs
     let s_epoch = to_string(json_fields.time);
     let s = substring(s_epoch, 0, 10);
     let ts_millis = (to_long(s) + 7200) * 1000;
     let new_date = parse_unix_milliseconds(ts_millis);
    
     set_field("date", new_date);
    
    
    
     set_field("remote_addr", to_string(json_fields.remote_addr));
     set_field("body_bytes_sent", 
     to_double(json_fields.body_bytes_sent));
     set_field("request_time", to_double(json_fields.request_time));
     set_field("response_status", 
     to_double(json_fields.response_status));
     set_field("request", to_string(json_fields.request));
     set_field("request_method", to_string(json_fields.request_method));
     set_field("host", to_string(json_fields.host));
     set_field("upstream_cache_status", 
     to_string(json_fields.upstream_cache_status));
     set_field("upstream_addr", to_string(json_fields.upstream_addr));
     set_field("http_x_forwarded_for", 
     to_string(json_fields.http_x_forwarded_for));
     set_field("http_referrer", to_string(json_fields.http_referrer));
     set_field("http_user_agent", 
     to_string(json_fields.http_user_agent));
     set_field("http_version", to_string(json_fields.http_version));
     set_field("nginx_access", to_bool(json_fields.nginx_access));
    
    end
    

    Note that you still have to configure an extractor, in this particular example, the original message looks a bit like this: nginx: {json}. So to make it only json, configure an extractor the following way:

    enter image description here

    So that's all, you may need to adjust it a bit if it doesn't work, but for most use cases it should.

    Still, if anyone would be interested in seeing the entire discussion that resulted in this solution, go to this link: https://community.graylog.org/t/failed-to-index-1-messages-failed-to-parse-field-datetime-of-type-date-in-document/24960/6