amazon-web-servicesamazon-cloudwatchcloudwatchamazon-cloudwatchlogs

CloudWatch logs acting weird


I have two log files with multi-line log statements. Both of them have same datetime format at the begining of each log statement. The configuration looks like this:

state_file = /var/lib/awslogs/agent-state

[/opt/logdir/log1.0]
datetime_format = %Y-%m-%d %H:%M:%S
file = /opt/logdir/log1.0
log_stream_name = /opt/logdir/logs/log1.0
initial_position = start_of_file
multi_line_start_pattern = {datetime_format}
log_group_name = my.log.group


[/opt/logdir/log2-console.log]
datetime_format = %Y-%m-%d %H:%M:%S
file = /opt/logdir/log2-console.log
log_stream_name = /opt/logdir/log2-console.log
initial_position = start_of_file
multi_line_start_pattern = {datetime_format}
log_group_name = my.log.group

The cloudwatch logs agent is sending log1.0 logs correctly to my log group on cloudwatch, however, its not sending log files for log2-console.log.

awslogs.log says:

2016-11-15 08:11:41,308 - cwlogs.push.batch - WARNING - 3593 - Thread-4 - Skip event: {'timestamp': 1479196444000, 'start_position': 42330916L, 'end_position': 42331504L}, reason: timestamp is more than 2 hours in future.
2016-11-15 08:11:41,308 - cwlogs.push.batch - WARNING - 3593 - Thread-4 - Skip event: {'timestamp': 1479196451000, 'start_position': 42331504L, 'end_position': 42332092L}, reason: timestamp is more than 2 hours in future.

Though server time is correct. Also weird thing is Line numbers mentioned in start_position and end_position does not exist in actual log file being pushed.

Anyone else experiencing this issue?


Solution

  • I was able to fix this.

    The state of awslogs was broken. The state is stored in a sqlite database in /var/awslogs/state/agent-state. You can access it via

    sudo sqlite3 /var/awslogs/state/agent-state
    

    sudo is needed to have write access.

    List all streams with

    select * from stream_state;
    

    Look up your log stream and note the source_id which is part of a json data structure in the v column.

    Then, list all records with this source_id (in my case it was 7675f84405fcb8fe5b6bb14eaa0c4bfd) in the push_state table

    select * from push_state where k="7675f84405fcb8fe5b6bb14eaa0c4bfd";
    

    The resulting record has a json data structure in the v column which contains a batch_timestamp. And this batch_timestamp seams to be wrong. It was in the past and any newer (more than 2 hours) log entries were not processed anymore.

    The solution is to update this record. Copy the v column, replace the batch_timestamp with the current timestamp and update with something like

    update push_state set v='... insert new value here ...' where k='7675f84405fcb8fe5b6bb14eaa0c4bfd';
    

    Restart the service with

    sudo /etc/init.d/awslogs restart
    

    I hope it works for you!