loggingfluentgrafana-loki

Fluent Bit to Fluentd forwarding issue with Loki plugin


Description:
I have a setup where Fluent Bit is installed on an IoT device. This device is configured to forward its logs to a Fluentd pod running within an EKS cluster, which is exposed via an AWS Network Load Balancer (NLB). The primary role of this Fluentd pod is to act as an aggregator and subsequently forward these logs to a Loki instance.

However, I’ve encountered an issue where Fluentd reports an “incoming chunk is broken” error when attempting to send logs from Fluent-Bit. The exact cause of this error remains unclear. It could be related to a configuration mismatch or other underlying issues.

Configuration:

Fluentd Configuration:

fluentd.conf: |
    <source>
      @type forward
      port 24224
      resolve_hostname true
    </source>

    <match **>
      @type loki
      url http://my-loki-url:3100
      extra_labels {“job”:“fluentd”}
      <buffer>
        flush_interval 10s
        flush_at_shutdown true
      </buffer>
    </match>

Fluent Bit Configuration:

[SERVICE]
    Flush        1
    Daemon       Off
    Log_Level    debug

[INPUT]
    Name         cpu
    Tag          cpu_usage

[OUTPUT]
    Name         forward
    Match        *
    Host         dns of the fluentd
    Port         24224

[OUTPUT]
    Name          stdout
    Match         *

Expected Behavior:
Fluent Bit should seamlessly forward logs to the Fluentd pod in the EKS cluster. Fluentd should then process these logs and forward them to Loki, ensuring all logs from the IoT device are captured and stored in Loki for analysis.

Actual Behavior:
Fluentd is encountering an “incoming chunk is broken” error during the log forwarding process from Fluent-Bit.

enter image description here

Update:

After diving deeper into the issue, I've come to believe that the "incoming chunk is broken" error is tied to the health check of the AWS Network Load Balancer (NLB). This error surfaces even without sending any logs, pointing to the possibility that it's triggered automatically when the NLB performs health checks.

Furthermore, when I switched to the grafana/fluent-plugin-loki:main image, I encountered another issue: config error file="/fluentd/etc/fluentd.conf" error_class=Fluent::ConfigError error="Unknown output plugin ' loki'".

Would appreciate any insights or suggestions on these matters. Thanks!


Solution

  • Solution:

    I've managed to resolve the issue I was facing. For those who might encounter a similar problem in the future, I've documented the entire troubleshooting process and the solution in this GitHub issue: https://github.com/grafana/loki/issues/10254.