Description:
I have a setup where Fluent Bit is installed on an IoT device. This device is configured to forward its logs to a Fluentd pod running within an EKS cluster, which is exposed via an AWS Network Load Balancer (NLB). The primary role of this Fluentd pod is to act as an aggregator and subsequently forward these logs to a Loki instance.
However, I’ve encountered an issue where Fluentd reports an “incoming chunk is broken” error when attempting to send logs from Fluent-Bit. The exact cause of this error remains unclear. It could be related to a configuration mismatch or other underlying issues.
Configuration:
Fluentd Configuration:
fluentd.conf: |
<source>
@type forward
port 24224
resolve_hostname true
</source>
<match **>
@type loki
url http://my-loki-url:3100
extra_labels {“job”:“fluentd”}
<buffer>
flush_interval 10s
flush_at_shutdown true
</buffer>
</match>
Fluent Bit Configuration:
[SERVICE]
Flush 1
Daemon Off
Log_Level debug
[INPUT]
Name cpu
Tag cpu_usage
[OUTPUT]
Name forward
Match *
Host dns of the fluentd
Port 24224
[OUTPUT]
Name stdout
Match *
Expected Behavior:
Fluent Bit should seamlessly forward logs to the Fluentd pod in the EKS cluster. Fluentd should then process these logs and forward them to Loki, ensuring all logs from the IoT device are captured and stored in Loki for analysis.
Actual Behavior:
Fluentd is encountering an “incoming chunk is broken” error during the log forwarding process from Fluent-Bit.
Update:
After diving deeper into the issue, I've come to believe that the "incoming chunk is broken" error is tied to the health check of the AWS Network Load Balancer (NLB). This error surfaces even without sending any logs, pointing to the possibility that it's triggered automatically when the NLB performs health checks.
Furthermore, when I switched to the grafana/fluent-plugin-loki:main
image, I encountered another issue: config error file="/fluentd/etc/fluentd.conf" error_class=Fluent::ConfigError error="Unknown output plugin ' loki'"
.
Would appreciate any insights or suggestions on these matters. Thanks!
Solution:
I've managed to resolve the issue I was facing. For those who might encounter a similar problem in the future, I've documented the entire troubleshooting process and the solution in this GitHub issue: https://github.com/grafana/loki/issues/10254.