dockerelasticsearchfluentd

Transform custom Docker logs into fluentd into elastic search


I will publish some Docker containers incorporating a common logging framework (written in golang). The logging format is a JSON format.

There is distinct data in this custom json logging format that I would like to be index/searchable from with Kibana. My understanding is that I need to transform/filter this data, but I'm struggling to understand how this is done even after RTFM. I have to extract JSON from JSON?

Some example output from minimal sample application as seen in Docker logs:

{"app_name":"SampleApp","app_port":6666,"app_version":"0.0.2","file":"/build/examples/sample/app/runmain/main.go:131","func":"example.com/code/microservices/examples/sample/app/runmain.mainErr.func1","fw_version":"v0.0.1","level":"info","msg":"listening","time":"2024-01-18T20:31:39.163970213+07:00"}

The data makes its way into fluentd and is logged as:

2024-01-18 20:31:39.000000000 +0000 f905b090d278: {"container_id":"f905b090d278ec2cc2f1f912acdbf8787a0a1c91d8ab7b00ad84e9da20c8c147","container_name":"/fervent_jemison","source":"stdout","log":"{\"app_name\":\"SampleApp\",\"app_port\":6666,\"app_version\":\"0.0.2\",\"file\":\"/build/examples/sample/app/runmain/main.go:131\",\"func\":\"example.com/code/microservices/examples/sample/app/runmain.mainErr.func1\",\"fw_version\":\"v0.0.1\",\"level\":\"info\",\"msg\":\"listening\",\"time\":\"2024-01-18T20:31:39.163970213+07:00\"}\r"}
<source>
  @type forward
  port 24224
  bind 0.0.0.0
</source>

<filter docker.**>
  @type parser
  key_name log
  reserve_data true
  <parse>
    @type json
  </parse>
</filter>

<match *.**>
  @type copy

  <store>
    @type elasticsearch
    host es
    port 9200
    user elastic
    password elastic
    logstash_format true
    logstash_prefix fluentd
    logstash_dateformat %Y%m%d
    include_tag_key true
    type_name access_log
    tag_key @log_name
    <buffer> 
      flush_interval 1s
    </buffer>
  </store>

  <store>
    @type stdout
  </store>
</match>

Initially, I'll deploy this all locally on a single host.

Any tips or further direction would be greatly appreciated. This is a new world to me.


Solution

  • you are dealing with logs sent and received by the forward directive, so you cannot simply parse JSON.

    <filter docker.**>
      @type parser
      key_name log
      reserve_data true
      <parse>
        @type json
      </parse>
    </filter>
    

    Instead, you should use regular expressions. It seems like you've done your job with the logger, so the log will be easily parsable.

    Im testing, and this regexp rule should work for you

    {"app_name":"(?<app_name>[^"]*)","app_port":(?<app_port>[^"]*),"app_version":"(?<app_version>[^"]*)","file":"(?<file>[^"]*)","func":"(?<func>[^"]*)","fw_version":"(?<fw_version>[^"]*)","level":"(?<level>[^"]*)","msg":"(?<msg>[^"]*)","time":"(?<time>[^"]*)
    

    As you said you are new to fluentd I can only recomend you to test your parse regexp using this 2 webs

    https://fluentular.herokuapp.com/parse?regexp=%7B%22app_name%22%3A%22%28%3F%3Capp_name%3E%5B%5E%22%5D*%29%22%2C%22app_port%22%3A%28%3F%3Capp_port%3E%5B%5E%22%5D*%29%2C%22app_version%22%3A%22%28%3F%3Capp_version%3E%5B%5E%22%5D*%29%22%2C%22file%22%3A%22%28%3F%3Cfile%3E%5B%5E%22%5D*%29%22%2C%22func%22%3A%22%28%3F%3Cfunc%3E%5B%5E%22%5D*%29%22%2C%22fw_version%22%3A%22%28%3F%3Cfw_version%3E%5B%5E%22%5D*%29%22%2C%22level%22%3A%22%28%3F%3Clevel%3E%5B%5E%22%5D*%29%22%2C%22msg%22%3A%22%28%3F%3Cmsg%3E%5B%5E%22%5D*%29%22%2C%22time%22%3A%22%28%3F%3Ctime%3E%5B%5E%22%5D*%29&input=%7B%22app_name%22%3A%22SampleApp%22%2C%22app_port%22%3A6666%2C%22app_version%22%3A%220.0.2%22%2C%22file%22%3A%22%2Fbuild%2Fexamples%2Fsample%2Fapp%2Frunmain%2Fmain.go%3A131%22%2C%22func%22%3A%22example.com%2Fcode%2Fmicroservices%2Fexamples%2Fsample%2Fapp%2Frunmain.mainErr.func1%22%2C%22fw_version%22%3A%22v0.0.1%22%2C%22level%22%3A%22info%22%2C%22msg%22%3A%22listening%22%2C%22time%22%3A%222024-01-18T20%3A31%3A39.163970213%2B07%3A00%22%7D&time_format= and https://regex101.com/