javalogstashlogbacklogstash-logback-encoder

Lost log messages when using append instead of appendRaw of LogstashMarker


I stumbled across lost log messages in our environment.

We use an ELK Stack and logback with logstash-logback-encoder in our Java applications. To do structured queries we use LogstashMarkers to append structured arguments to the logs. In the case of the lost log messages it is a string, which contains a JSON.

Some of the log messages are lost completly. Not only the markers were not saved but the complete log message. They are not visible in Kibana. There are no errors in the logs of our application or the logstash server.

I could create an example repository with minimal code and published it at GitHub: https://github.com/seism0saurus/logging-issue

To run it, you need a logstash server and netcat. The logger in that project is configured to use a console appender, so we have a reference, what messages where really logged. Additionally there is one appender for a netcat server on localhost:8081 and one appender for the logstash server. Both have the same config. Only the address is different. So I'm quite sure that the message is send to the logstash server, if it is send to netcat.

Here is the code of the main class:

@SpringBootApplication
public class LoggingIssueApplication {

    private static final Logger LOGGER = getLogger("customLogger");
    private static final String MESSAGE = "{  \"ordernumber\": \"Test1\",  \"shippingmethod\": \"homeDelivery\",  \"customer\": {    \"id\": \"1337\",    \"salutation\": \"Mr\",    \"name\": \"seism0saurus\",    \"birthdate\": \"1900-05-01\",    \"email\": \"test@seism0saurus.de\",    \"phone\": \"\"  },  \"billingAddress\": {    \"name\": \"seism0saurus\",    \"address\": \"Teststreet 5\",    \"address2\": \"\",    \"postcode\": \"90451\",    \"city\": \"Nürnberg\",    \"country\": \"Deutschland\",    \"countrycode\": \"DEU\"  },  \"shippingAddress\": {    \"name\": \"seism0saurus\",    \"address\": \"Teststreet 5\",    \"address2\": \"\",    \"postcode\": \"90451\",    \"city\": \"Nürnberg\",    \"country\": \"Deutschland\",    \"countrycode\": \"DEU\"  },  \"totalItems\": 1,  \"items\": [    {      \"position\": 53,      \"bundlenumber\": 53,      \"itemnumber\": \"900508\",      \"itemdescription\": \"Dinosaur Cookies\",      \"quantity\": 4,      \"saleType\": \"KL\",      \"saleItemType\": \"\",      \"grossAmount\": 40.0,      \"discounts\": [      ]    }  ]}";

    public static void main(String[] args) {
        SpringApplication.run(LoggingIssueApplication.class, args);
    }

    @EventListener(ApplicationReadyEvent.class)
    public void testAppender() throws InterruptedException {

        LOGGER.warn("1 - Log without marker and message as parameter: {}", MESSAGE);

        LogstashMarker payloadMarker = Markers.append("payload", MESSAGE);
        LOGGER.warn(payloadMarker, "2 - Log with payload marker without parameter");
        LOGGER.warn(payloadMarker, "3 - Log with payload marker and message as parameter: {}", MESSAGE);

        LogstashMarker rawMarker = Markers.appendRaw("raw", MESSAGE);
        LOGGER.warn(rawMarker, "4 - Log with raw marker without parameter");

        //Needed, so the loggers can finish their job before the application is shut down.
        Thread.sleep(5000);
    }
}

And this is the logstash appender:

  <appender name="nc" class="net.logstash.logback.appender.LogstashTcpSocketAppender">
    <destination>localhost:8081</destination>
    <keepAliveDuration>5 minutes</keepAliveDuration>
    <listener class="net.logstash.logback.appender.listener.FailureSummaryLoggingAppenderListener"/>

    <encoder class="net.logstash.logback.encoder.LoggingEventCompositeJsonEncoder">
      <providers>
        <mdc/>
        <context/>
        <version/>
        <logLevel/>
        <logLevelValue/>
        <loggerName/>

        <pattern>
          <pattern>
            {
            "serviceName": "logging-issue",
            "environment": "dev",
            "errorMessage": "%ex{0}"
            }
          </pattern>
        </pattern>

        <threadName/>
        <message/>
        <logstashMarkers/>
        <arguments/>
        <stackTrace/>
      </providers>
    </encoder>
  </appender>

When I run the application, I get the expected 4 log messages. All 4 appear in the console and the netcat server. But only number 1 and 4 appear in kibana. Therefore I assume that there is a difference in handling Markers.append and Markers.appendRaw. The MESSAGE is a string containing JSON. But I hope, that I don't have to check the content of the string to decide which method to invoke.

There are no error messages in the log of the logstash server. It doesn't seem to be connection errors, unknown or unrechable hosts or full buffers. I reproduced all these cases and they leave a meaningfull error message in the logs.

Do you know, what happens here? Do I have to check the content of the strings I want to put into an Marker to prevent the loss of the complete log message?

EDIT: I got more information about the problem. A logstash instance inside docker on my local machine does receive all logs and writes them to a file with the following pipeline.

input {
  tcp {
    port => 9000
    codec => json_lines
  }
}

output {
  file {
    path => "/mounted_dir/logstash.out"
    codec => line {
      format => "%{message}"
    }
  }
}

EDIT 2: Based on https://jaxenter.de/elastic-stack-containern-docker-86374 I build a complete ELK stack for a comparison. It doesn't loose any logmessage. Therefore where has to be a problem with the ELK stack, our admins build. Do you have any ideas?


Solution

  • We found the problem. To help others, who come across this question, I will explain the problem and the solution here.

    The log messages were successfully transmitted to logstash and elasticsearch. So there were no errors in those logs. The reason, that we could not find the log entries was the field for the "payload". We have multiple services and one of them was sending an object instead of a string. That lead to a mapping conflict and the log messages could not be accessed. error message

    The solution for us was to define an explicit mapping and to introduce a naming convention. Now we have a text field payload and an object field payloadAsObject. This convention is used in all of our services.

    The problematic log entries were far enough in the past, so we could delete the old index and solve the conflict. But there are possibilities to reindex the data without downtime. For example this one: https://medium.com/craftsmenltd/rebuild-elasticsearch-index-without-downtime-168363829ea4