loggingfluentdfluentfluent-bitkubernetes-operator

fluentbit connection to fluentd refused


The issue

I have been trying to use the fluent-operator to deploy fluentbit and fluentd in a multi-tenant scenario in EKS cluster.

The goal is to collect logs with fluentbit and then forward to fluentd to process and send to OpenSearch.

The logs are being collected by fluentbit, but then fluentbit pod logs the following error when trying to communicate with fluentd:

[2023/02/10 17:54:57] [error] [net] TCP connection failed: fluentd.fluent.svc:24224 (Connection refused)
[2023/02/10 17:54:57] [error] [output:forward:forward.0] no upstream connections available
[2023/02/10 17:54:57] [error] [engine] chunk '12-1676051688.632628964.flb' cannot be retried: task_id=16, input=tail.1 > output=forward.0
[2023/02/10 17:54:57] [ warn] [engine] failed to flush chunk '12-1676051696.570563472.flb', retry in 6 seconds: task_id=7, input=tail.1 > output=forward.0 (out_id=0)
[2023/02/10 17:54:57] [error] [engine] chunk '12-1676051685.661115204.flb' cannot be retried: task_id=8, input=tail.1 > output=forward.0
[2023/02/10 17:54:57] [ warn] [engine] failed to flush chunk '12-1676051696.742618827.flb', retry in 6 seconds: task_id=10, input=tail.1 > output=forward.0 (out_id=0)
[2023/02/10 17:54:57] [ info] [input:tail:tail.1] inode=45094081 handle rotation(): /var/log/containers/fluent-bit-dj2j8_fluent_fluent-bit-a1d1b1304f8a9f66bb394f20e2400898f9dbe354992f4190e44d2f6b2d48d80f.log => /var/log/pods/fluent_fluent-bit-dj2j8_b907b949-bc53-47e6-91f0-709647fd7733/fluent-bit/0.log.20230210-175457
[2023/02/10 17:54:57] [ info] [input:tail:tail.1] inotify_fs_remove(): inode=45094081 watch_fd=966

Fluentd starts up fine, and then can't connect to OpenSearch:

level=info msg="Fluentd started"
2023-02-14 21:22:23 +0000 [info]: init supervisor logger path=nil rotate_age=nil rotate_size=nil
2023-02-14 21:22:23 +0000 [info]: parsing config file is succeeded path="/fluentd/etc/fluent.conf"
2023-02-14 21:22:24 +0000 [info]: gem 'fluentd' version '1.15.3'
2023-02-14 21:22:24 +0000 [info]: gem 'fluent-plugin-aws-elasticsearch-service' version '2.4.1'
2023-02-14 21:22:24 +0000 [info]: gem 'fluent-plugin-dedot_filter' version '1.0.0'
2023-02-14 21:22:24 +0000 [info]: gem 'fluent-plugin-detect-exceptions' version '0.0.14'
2023-02-14 21:22:24 +0000 [info]: gem 'fluent-plugin-elasticsearch' version '5.2.4'
2023-02-14 21:22:24 +0000 [info]: gem 'fluent-plugin-grafana-loki' version '1.2.20'
2023-02-14 21:22:24 +0000 [info]: gem 'fluent-plugin-kafka' version '0.18.1'
2023-02-14 21:22:24 +0000 [info]: gem 'fluent-plugin-label-router' version '0.2.10'
2023-02-14 21:22:24 +0000 [info]: gem 'fluent-plugin-multi-format-parser' version '1.0.0'
2023-02-14 21:22:24 +0000 [info]: gem 'fluent-plugin-opensearch' version '1.0.10'
2023-02-14 21:22:24 +0000 [info]: gem 'fluent-plugin-oss' version '0.0.2'
2023-02-14 21:22:24 +0000 [info]: gem 'fluent-plugin-record-modifier' version '2.1.1'
2023-02-14 21:22:24 +0000 [info]: gem 'fluent-plugin-rewrite-tag-filter' version '2.4.0'
2023-02-14 21:22:24 +0000 [info]: gem 'fluent-plugin-s3' version '1.7.2'
2023-02-14 21:22:24 +0000 [info]: gem 'fluent-plugin-sumologic_output' version '1.8.0'
2023-02-14 21:22:25 +0000 [info]: using configuration file: <ROOT>
  <system>
    rpc_endpoint "127.0.0.1:24444"
    log_level info
    workers 1
  </system>
  <source>
    @type forward
    bind "0.0.0.0"
    port 24224
  </source>
  <match **>
    @id main
    @type label_router
    <route>
      @label "@d2d59c6c703bc71418b747e394ea26bb"
      <match>
        namespaces fluent,kube-system,kyverno,observability-system
      </match>
    </route>
  </match>
  <label @d2d59c6c703bc71418b747e394ea26bb>
    <match **>
      @id ClusterFluentdConfig-cluster-fluentd-config::cluster::clusteroutput::fluentd-output-opensearch-0
      @type opensearch
      host "vpc-XXXXX-us-west-2-XXXXXXX.us-west-2.es.amazonaws.com"
      logstash_format true
      logstash_prefix "logs"
      port 9200
    </match>
  </label>
  <match **>
    @type null
    @id main-no-output
  </match>
  <label @FLUENT_LOG>
    <match fluent.*>
      @type null
      @id main-fluentd-log
    </match>
  </label>
</ROOT>
2023-02-14 21:22:25 +0000 [info]: starting fluentd-1.15.3 pid=13 ruby="3.1.3"
2023-02-14 21:22:25 +0000 [info]: spawn command to main:  cmdline=["/usr/bin/ruby", "-Eascii-8bit:ascii-8bit", "/usr/bin/fluentd", "-c", "/fluentd/etc/fluent.conf", "-p", "/fluentd/plugins", "--under-supervisor"]
2023-02-14 21:22:25 +0000 [info]: init supervisor logger path=nil rotate_age=nil rotate_size=nil
2023-02-14 21:22:27 +0000 [info]: #0 init worker0 logger path=nil rotate_age=nil rotate_size=nil
2023-02-14 21:22:27 +0000 [info]: adding match in @d2d59c6c703bc71418b747e394ea26bb pattern="**" type="opensearch"
2023-02-14 21:22:36 +0000 [warn]: #0 [ClusterFluentdConfig-cluster-fluentd-config::cluster::clusteroutput::fluentd-output-opensearch-0] Could not communicate to OpenSearch, resetting connection and trying again. connect_write timeout reached
2023-02-14 21:22:36 +0000 [warn]: #0 [ClusterFluentdConfig-cluster-fluentd-config::cluster::clusteroutput::fluentd-output-opensearch-0] Remaining retry: 14. Retry to communicate after 2 second(s).
2023-02-14 21:22:45 +0000 [warn]: #0 [ClusterFluentdConfig-cluster-fluentd-config::cluster::clusteroutput::fluentd-output-opensearch-0] Could not communicate to OpenSearch, resetting connection and trying again. connect_write timeout reached

The configuration of fluentd-output-opensearch, fluentd service, fluentbit service, clusteroutput.fluentbit, fluentd pod and fluentbit pod seem ok:

apiVersion: fluentd.fluent.io/v1alpha1
kind: ClusterOutput
metadata:
  annotations:
    meta.helm.sh/release-name: fluent-operator
    meta.helm.sh/release-namespace: fluent
  creationTimestamp: "2023-02-10T14:28:57Z"
  generation: 1
  labels:
    app.kubernetes.io/managed-by: Helm
    output.fluentd.fluent.io/enabled: "true"
  name: fluentd-output-opensearch
  resourceVersion: "8982613"
  uid: dcacb711-72b5-4fb3-9ec8-fab78f85e171
spec:
  outputs:
  - buffer:
      path: /buffers/opensearch
      type: file
    opensearch:
      host: vpc-XXXX-us-west-2-XXXXXXXXXX.us-west-2.es.amazonaws.com
      logstashFormat: true
      logstashPrefix: logs
      port: 9200
apiVersion: v1
kind: Service
metadata:
  creationTimestamp: "2023-02-10T12:29:53Z"
  labels:
    app.kubernetes.io/component: fluentd
    app.kubernetes.io/instance: fluentd
    app.kubernetes.io/name: fluentd
  name: fluentd
  namespace: fluent
  ownerReferences:
  - apiVersion: fluentd.fluent.io/v1alpha1
    blockOwnerDeletion: true
    controller: true
    kind: Fluentd
    name: fluentd
    uid: 98e29fa5-c0c0-4239-a7d8-61eb3ff59c18
  resourceVersion: "8902659"
  uid: 62273018-9921-41b9-a38a-32c703264a4c
spec:
  clusterIP: 10.100.195.123
  clusterIPs:
  - 10.100.195.123
  internalTrafficPolicy: Cluster
  ipFamilies:
  - IPv4
  ipFamilyPolicy: SingleStack
  ports:
  - name: forward
    port: 24224
    protocol: TCP
    targetPort: forward
  selector:
    app.kubernetes.io/component: fluentd
    app.kubernetes.io/instance: fluentd
    app.kubernetes.io/name: fluentd
  sessionAffinity: None
  type: ClusterIP
status:
  loadBalancer: {}
apiVersion: v1
kind: Service
metadata:
  creationTimestamp: "2023-02-13T18:44:57Z"
  labels:
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: fluent-bit
  name: fluent-bit
  namespace: fluent
  ownerReferences:
  - apiVersion: fluentbit.fluent.io/v1alpha2
    blockOwnerDeletion: true
    controller: true
    kind: FluentBit
    name: fluent-bit
    uid: 4fae4404-bea4-4cdd-aaf3-52b97d758bff
  resourceVersion: "12053875"
  uid: 89fa21db-cd70-4bcd-81f6-a1bd47cab74c
spec:
  clusterIP: 10.100.253.128
  clusterIPs:
  - 10.100.253.128
  internalTrafficPolicy: Cluster
  ipFamilies:
  - IPv4
  ipFamilyPolicy: SingleStack
  ports:
  - name: metrics
    port: 2020
    protocol: TCP
    targetPort: 2020
  selector:
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: fluent-bit
  sessionAffinity: None
  type: ClusterIP
status:
  loadBalancer: {}
apiVersion: fluentbit.fluent.io/v1alpha2
kind: ClusterOutput
metadata:
  annotations:
    meta.helm.sh/release-name: fluent-operator
    meta.helm.sh/release-namespace: fluent
  creationTimestamp: "2023-02-10T12:29:44Z"
  generation: 1
  labels:
    app.kubernetes.io/managed-by: Helm
    fluentbit.fluent.io/component: logging
    fluentbit.fluent.io/enabled: "true"
  name: fluentd
  resourceVersion: "8902495"
  uid: b333b5e4-128d-419c-a726-cd8a8edeb4cf
spec:
  forward:
    host: fluentd.fluent.svc
    port: 24224
  matchRegex: (?:kube|service)\.(.*)
apiVersion: v1
kind: Pod
metadata:
  annotations:
    kubernetes.io/psp: eks.privileged
  creationTimestamp: "2023-02-13T18:44:58Z"
  generateName: fluentd-
  labels:
    app.kubernetes.io/component: fluentd
    app.kubernetes.io/instance: fluentd
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: fluentd
    controller-revision-hash: fluentd-d8ddb8bd9
    statefulset.kubernetes.io/pod-name: fluentd-0
  name: fluentd-0
  namespace: fluent
  ownerReferences:
  - apiVersion: apps/v1
    blockOwnerDeletion: true
    controller: true
    kind: StatefulSet
    name: fluentd
    uid: 7c239d83-7421-4ed6-88a8-2e5f6c76facd
  resourceVersion: "12054209"
  uid: 2a3b0d84-78e6-4ae1-a90c-4a3d6fccba71
spec:
  containers:
  - env:
    - name: BUFFER_PATH
      value: /buffers
    image: kubesphere/fluentd:v1.15.3
    imagePullPolicy: IfNotPresent
    name: fluentd
    ports:
    - containerPort: 2021
      name: metrics
      protocol: TCP
    - containerPort: 24224
      name: forward
      protocol: TCP
    resources:
      limits:
        cpu: 500m
        memory: 500Mi
      requests:
        cpu: 100m
        memory: 128Mi
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /fluentd/etc
      name: config
      readOnly: true
    - mountPath: /buffers
      name: fluentd-buffer-pvc
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-n7vbs
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  hostname: fluentd-0
  nodeName: ip-172-23-137-214.us-west-2.compute.internal
  preemptionPolicy: PreemptLowerPriority
  priority: 0
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext: {}
  serviceAccount: fluentd
  serviceAccountName: fluentd
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  volumes:
  - name: fluentd-buffer-pvc
    persistentVolumeClaim:
      claimName: fluentd-buffer-pvc-fluentd-0
  - name: config
    secret:
      defaultMode: 420
      secretName: fluentd-config
  - name: kube-api-access-n7vbs
    projected:
      defaultMode: 420
      sources:
      - serviceAccountToken:
          expirationSeconds: 3607
          path: token
      - configMap:
          items:
          - key: ca.crt
            path: ca.crt
          name: kube-root-ca.crt
      - downwardAPI:
          items:
          - fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
            path: namespace
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2023-02-13T18:45:02Z"
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2023-02-13T18:45:14Z"
    status: "True"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2023-02-13T18:45:14Z"
    status: "True"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2023-02-13T18:45:02Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - containerID: containerd://729776240915f3377c6a9bf06a7e19a5213672da96468cd9c8b599f157d6386c
    image: docker.io/kubesphere/fluentd:v1.15.3
    imageID: docker.io/kubesphere/fluentd@sha256:58caf053b0f903ce3d0fc86b7bc748839e1a4aed6c7d8c1d3285d28553e93bce
    lastState: {}
    name: fluentd
    ready: true
    restartCount: 0
    started: true
    state:
      running:
        startedAt: "2023-02-13T18:45:13Z"
  hostIP: 172.23.137.214
  phase: Running
  podIP: 172.30.43.227
  podIPs:
  - ip: 172.30.43.227
  qosClass: Burstable
  startTime: "2023-02-13T18:45:02Z"
apiVersion: v1
kind: Pod
metadata:
  annotations:
    kubernetes.io/psp: eks.privileged
  creationTimestamp: "2023-02-13T18:44:57Z"
  generateName: fluent-bit-
  labels:
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: fluent-bit
    controller-revision-hash: 7b98cd9f49
    pod-template-generation: "1"
  name: fluent-bit-2sx6v
  namespace: fluent
  ownerReferences:
  - apiVersion: apps/v1
    blockOwnerDeletion: true
    controller: true
    kind: DaemonSet
    name: fluent-bit
    uid: d33dcff3-2e04-42dd-816c-0edb3ea63a19
  resourceVersion: "12053982"
  uid: 296d44ba-b761-47f0-a4ec-ed55dfa507dd
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchFields:
          - key: metadata.name
            operator: In
            values:
            - ip-172-23-137-29.us-west-2.compute.internal
  containers:
  - env:
    - name: NODE_NAME
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: spec.nodeName
    - name: HOST_IP
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: status.hostIP
    image: kubesphere/fluent-bit:v2.0.9
    imagePullPolicy: IfNotPresent
    name: fluent-bit
    ports:
    - containerPort: 2020
      name: metrics
      protocol: TCP
    resources:
      limits:
        cpu: 500m
        memory: 200Mi
      requests:
        cpu: 10m
        memory: 25Mi
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /containers
      name: varlibcontainers
      readOnly: true
    - mountPath: /fluent-bit/config
      name: config
      readOnly: true
    - mountPath: /var/log/
      name: varlogs
      readOnly: true
    - mountPath: /var/log/journal
      name: systemd
      readOnly: true
    - mountPath: /fluent-bit/tail
      name: positions
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-gzqz8
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  nodeName: ip-172-23-137-29.us-west-2.compute.internal
  preemptionPolicy: PreemptLowerPriority
  priority: 0
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext: {}
  serviceAccount: fluent-bit
  serviceAccountName: fluent-bit
  terminationGracePeriodSeconds: 30
  tolerations:
  - operator: Exists
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
  - effect: NoSchedule
    key: node.kubernetes.io/disk-pressure
    operator: Exists
  - effect: NoSchedule
    key: node.kubernetes.io/memory-pressure
    operator: Exists
  - effect: NoSchedule
    key: node.kubernetes.io/pid-pressure
    operator: Exists
  - effect: NoSchedule
    key: node.kubernetes.io/unschedulable
    operator: Exists
  volumes:
  - hostPath:
      path: /containers
      type: ""
    name: varlibcontainers
  - name: config
    secret:
      defaultMode: 420
      secretName: fluent-bit-config
  - hostPath:
      path: /var/log
      type: ""
    name: varlogs
  - hostPath:
      path: /var/log/journal
      type: ""
    name: systemd
  - hostPath:
      path: /var/lib/fluent-bit/
      type: ""
    name: positions
  - name: kube-api-access-gzqz8
    projected:
      defaultMode: 420
      sources:
      - serviceAccountToken:
          expirationSeconds: 3607
          path: token
      - configMap:
          items:
          - key: ca.crt
            path: ca.crt
          name: kube-root-ca.crt
      - downwardAPI:
          items:
          - fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
            path: namespace
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2023-02-13T18:44:57Z"
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2023-02-13T18:44:59Z"
    status: "True"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2023-02-13T18:44:59Z"
    status: "True"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2023-02-13T18:44:57Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - containerID: containerd://7e5101ec69c0b0f4749f3462306801b41ff41e6c288eff74a75e253e79626720
    image: docker.io/kubesphere/fluent-bit:v2.0.9
    imageID: docker.io/kubesphere/fluent-bit@sha256:7b66bfc157e60f17e26c5e1dbbe1ae79768446ffaad06b4a013a3efb65815cce
    lastState: {}
    name: fluent-bit
    ready: true
    restartCount: 0
    started: true
    state:
      running:
        startedAt: "2023-02-13T18:44:58Z"
  hostIP: 172.23.137.29
  phase: Running
  podIP: 172.30.30.141
  podIPs:
  - ip: 172.30.30.141
  qosClass: Burstable
  startTime: "2023-02-13T18:44:57Z"

Also, the fluentd globalInputs seem to be correct for forward inputs:

apiVersion: fluentd.fluent.io/v1alpha1
kind: Fluentd
metadata:
  annotations:
    meta.helm.sh/release-name: fluent-operator
    meta.helm.sh/release-namespace: fluent
  creationTimestamp: "2023-02-13T20:13:59Z"
  finalizers:
  - fluentd.fluent.io
  generation: 1
  labels:
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: fluentd
  name: fluentd
  namespace: fluent
  resourceVersion: "12115920"
  uid: f1448972-45d5-4a36-8d0d-ed2cf65ff730
spec:
  fluentdCfgSelector:
    matchLabels:
      config.fluentd.fluent.io/enabled: "true"
  globalInputs:
  - forward:
      bind: 0.0.0.0
      port: 24224
  image: kubesphere/fluentd:v1.15.3
  replicas: 1
  resources:
    limits:
      cpu: 500m
      memory: 500Mi
    requests:
      cpu: 100m
      memory: 128Mi
status:
  messages: all matched cfgs is valid
  state: active

I have all fluentbit, fluentd and fluent-operator pods up and running in the same namespace.

I also execd' into both fluentbit and fluentd pods. Running ping from fluentbit container to fluentd's podIP. It seems to work.

root@fluent-bit-gtslr:/# ping  172.30.30.141
PING 172.30.30.141 (172.30.30.141) 56(84) bytes of data.
64 bytes from 172.30.30.141: icmp_seq=1 ttl=253 time=0.742 ms
64 bytes from 172.30.30.141: icmp_seq=2 ttl=253 time=0.711 ms
64 bytes from 172.30.30.141: icmp_seq=3 ttl=253 time=0.693 ms
64 bytes from 172.30.30.141: icmp_seq=4 ttl=253 time=0.730 ms
64 bytes from 172.30.30.141: icmp_seq=5 ttl=253 time=0.730 ms
^C
--- 172.30.30.141 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4093ms
rtt min/avg/max/mdev = 0.693/0.721/0.742/0.017 ms

Why am I getting this error?

fluent-operator installation

I installed the fluent-operator via Helm:

helm install fluent-operator --create-namespace -n fluent https://github.com/fluent/fluent-operator/releases/download/v2.0.1/fluent-operator.tgz --values values.yaml

The values.yaml has the following configuration:

# Default values for fluentbit-operator.
# This is a YAML-formatted file.
# Declare variables to be passed into your templates.

#Set this to containerd or crio if you want to collect CRI format logs
containerRuntime: docker
#  If you want to deploy a default Fluent Bit pipeline (including Fluent Bit Input, Filter, and output) to collect Kubernetes logs, you'll need to set the Kubernetes parameter to true
# see https://github.com/fluent/fluent-operator/tree/master/manifests/logging-stack
Kubernetes: true

operator:
# The init container is to get the actual storage path of the docker log files so that it can be mounted to collect the logs.
# see https://github.com/fluent/fluent-operator/blob/master/manifests/setup/fluent-operator-deployment.yaml#L26
  initcontainer:
    repository: "docker"
    tag: "20.10"
  container:
    repository: "kubesphere/fluent-operator"
    tag: "latest"
    # FluentBit operator resources. Usually user needn't to adjust these.
  resources:
    limits:
      cpu: 100m
      memory: 60Mi
    requests:
      cpu: 100m
      memory: 20Mi
  # Specify custom annotations to be added to each Fluent Operator pod.
  annotations: {}
  ## Reference to one or more secrets to be used when pulling images
  ## ref: https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/
  imagePullSecrets: []
  # - name: "image-pull-secret"
  # Reference one more key-value pairs of labels that should be attached to fluent-operator
  labels: {}
#  myExampleLabel: someValue
  logPath:
    # The operator currently assumes a Docker container runtime path for the logs as the default, for other container runtimes you can set the location explicitly below.
    # crio: /var/log
    containerd: /var/log

fluentbit:
  image:
    repository: "kubesphere/fluent-bit"
    tag: "v2.0.9"
  # fluentbit resources. If you do want to specify resources, adjust them as necessary
  #You can adjust it based on the log volume.
  resources:
    limits:
      cpu: 500m
      memory: 200Mi
    requests:
      cpu: 10m
      memory: 25Mi
  # Specify custom annotations to be added to each FluentBit pod.
  annotations: {}
    ## Request to Fluent Bit to exclude or not the logs generated by the Pod.
    # fluentbit.io/exclude: "true"
    ## Prometheus can use this tag to automatically discover the Pod and collect monitoring data
    # prometheus.io/scrape: "true"
  # Specify additional custom labels for fluentbit-pods
  labels: {}

  ## Reference to one or more secrets to be used when pulling images
  ## ref: https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/
  ##
  imagePullSecrets: [ ]
  # - name: "image-pull-secret"
  secrets: []
  # List of volumes that can be mounted by containers belonging to the pod.
  additionalVolumes: []
  # Pod volumes to mount into the container's filesystem.
  additionalVolumesMounts: []

  # Remove the above empty volumes and volumesMounts, and then set additionalVolumes and additionalVolumesMounts as below if you want to collect node exporter metrics
# additionalVolumes:
#   - name: hostProc
#     hostPath:
#       path: /proc/
#   - name: hostSys
#     hostPath:
#       path: /sys/
# additionalVolumesMounts:
#   - mountPath: /host/sys
#     mountPropagation: HostToContainer
#     name: hostSys
#     readOnly: true
#   - mountPath: /host/proc
#     mountPropagation: HostToContainer
#     name: hostProc
#     readOnly: true

  #Set a limit of memory that Tail plugin can use when appending data to the Engine.
  # You can find more details here: https://docs.fluentbit.io/manual/pipeline/inputs/tail#config
  #If the limit is reach, it will be paused; when the data is flushed it resumes.
  #if the inbound traffic is less than 2.4Mbps, setting memBufLimit to 5MB is enough
  #if the inbound traffic is less than 4.0Mbps, setting memBufLimit to 10MB is enough
  #if the inbound traffic is less than 13.64Mbps, setting memBufLimit to 50MB is enough
  input:
    tail:
      memBufLimit: 5MB
    nodeExporterMetrics: {}
    # uncomment below nodeExporterMetrics section if you want to collect node exporter metrics
#   nodeExporterMetrics:
#     tag: node_metrics
#     scrapeInterval: 15s
#     path:
#       procfs: /host/proc
#       sysfs: /host/sys

  #Configure the output plugin parameter in FluentBit.
  #You can set enable to true to output logs to the specified location.
  output:
#  You can find more supported output plugins here: https://github.com/fluent/fluent-operator/tree/master/docs/plugins/fluentbit/clusteroutput
    es:
      enable: false
      host: "<Elasticsearch url like elasticsearch-logging-data.kubesphere-logging-system.svc>"
      port: 9200
      logstashPrefix: ks-logstash-log
#      path: ""
#      bufferSize: "4KB"
#      index: "fluent-bit"
#      httpUser:
#      httpPassword:
#      logstashFormat: true
#      replaceDots: false
#      enableTLS: false
#      tls:
#        verify: On
#        debug: 1
#        caFile: "<Absolute path to CA certificate file>"
#        caPath: "<Absolute path to scan for certificate files>"
#        crtFile: "<Absolute path to private Key file>"
#        keyFile: "<Absolute path to private Key file>"
#        keyPassword:
#        vhost: "<Hostname to be used for TLS SNI extension>"
    kafka:
      enable: false
      brokers: "<kafka broker list like xxx.xxx.xxx.xxx:9092,yyy.yyy.yyy.yyy:9092>"
      topics: ks-log
    opentelemetry: {}
# You can configure the opentelemetry-related configuration here
    opensearch: {}
# You can configure the opensearch-related configuration here
    stdout:
      enable: true
    forward:
      enable: true
      host: fluentd
      port: 24224

  #Configure the default filters in FluentBit.
  # The `filter` will filter and parse the collected log information and output the logs into a uniform format. You can choose whether to turn this on or not.
  filter:
    kubernetes:
      enable: true
      labels: true
      annotations: true
    containerd:
  # This is customized lua containerd log format converter, you can refer here:
  # https://github.com/fluent/fluent-operator/blob/master/charts/fluent-operator/templates/fluentbit-clusterfilter-containerd.yaml
  # https://github.com/fluent/fluent-operator/blob/master/charts/fluent-operator/templates/fluentbit-containerd-config.yaml
      enable: true
    systemd:
      enable: true

fluentd:
  enable: true
  name: fluentd
  port: 24224
  image:
    repository: "kubesphere/fluentd"
    tag: "v1.15.3"
  replicas: 1
  forward:
    port: 24224
  watchedNamespaces:
    - default
    - kube-system
    - test-namespace
    - fluent
  resources:
    limits:
      cpu: 500m
      memory: 500Mi
    requests:
      cpu: 100m
      memory: 128Mi
  # Configure the output plugin parameter in Fluentd.
  # Fluentd is disabled by default, if you enable it make sure to also set up an output to use.
  output:
    es:
      enable: false
      host: elasticsearch-logging-data.kubesphere-logging-system.svc
      port: 9200
      logstashPrefix: ks-logstash-log
      buffer:
        enable: false
        type: file
        path: /buffers/es
    kafka:
      enable: false
      brokers: "my-cluster-kafka-bootstrap.default.svc:9091,my-cluster-kafka-bootstrap.default.svc:9092,my-cluster-kafka-bootstrap.default.svc:9093"
      topicKey: kubernetes_ns
      buffer:
        enable: false
        type: file
        path: /buffers/kafka
    stdout:
      enable: true
    opensearch:
      enable: true
      host: vpc-XXX-us-west-2-XXXXXXXX.us-west-2.es.amazonaws.com
      port: 9200
      logstashPrefix: logs
      buffer:
        enable: true
        type: file
        path: /buffers/opensearch

nameOverride: ""
fullnameOverride: ""
namespaceOverride: ""

Solution

  • I have found a solution.

    It seems that fluentd refuses fluentbit connection if it can't connect to OpenSearch beforehand.

    I was sending logs to OpenSearch on port 9200(http). Then, I tested it on port 443.

    Pinging OpenSearch from the node and from the pod on port 443 was the only request that worked.

    So, I just added port 443 and scheme https to values.yaml. After that, logs starded popping up on OpenSearch Dashboards(Kibana). It ended like this:

    # Default values for fluentbit-operator.
    # This is a YAML-formatted file.
    # Declare variables to be passed into your templates.
    
    #Set this to containerd or crio if you want to collect CRI format logs
    containerRuntime: docker
    #  If you want to deploy a default Fluent Bit pipeline (including Fluent Bit Input, Filter, and output) to collect Kubernetes logs, you'll need to set the Kubernetes parameter to true
    # see https://github.com/fluent/fluent-operator/tree/master/manifests/logging-stack
    Kubernetes: true
    
    operator:
    # The init container is to get the actual storage path of the docker log files so that it can be mounted to collect the logs.
    # see https://github.com/fluent/fluent-operator/blob/master/manifests/setup/fluent-operator-deployment.yaml#L26
      initcontainer:
        repository: "docker"
        tag: "20.10"
      container:
        repository: "kubesphere/fluent-operator"
        tag: "latest"
        # FluentBit operator resources. Usually user needn't to adjust these.
      resources:
        limits:
          cpu: 100m
          memory: 60Mi
        requests:
          cpu: 100m
          memory: 20Mi
      # Specify custom annotations to be added to each Fluent Operator pod.
      annotations: {}
      ## Reference to one or more secrets to be used when pulling images
      ## ref: https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/
      imagePullSecrets: []
      # - name: "image-pull-secret"
      # Reference one more key-value pairs of labels that should be attached to fluent-operator
      labels: {}
    #  myExampleLabel: someValue
      logPath:
        # The operator currently assumes a Docker container runtime path for the logs as the default, for other container runtimes you can set the location explicitly below.
        # crio: /var/log
        containerd: /var/log
    
    fluentbit:
      image:
        repository: "kubesphere/fluent-bit"
        tag: "v2.0.9"
      # fluentbit resources. If you do want to specify resources, adjust them as necessary
      #You can adjust it based on the log volume.
      resources:
        limits:
          cpu: 500m
          memory: 200Mi
        requests:
          cpu: 10m
          memory: 25Mi
      # Specify custom annotations to be added to each FluentBit pod.
      annotations: {}
        ## Request to Fluent Bit to exclude or not the logs generated by the Pod.
        # fluentbit.io/exclude: "true"
        ## Prometheus can use this tag to automatically discover the Pod and collect monitoring data
        # prometheus.io/scrape: "true"
      # Specify additional custom labels for fluentbit-pods
      labels: {}
    
      ## Reference to one or more secrets to be used when pulling images
      ## ref: https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/
      ##
      imagePullSecrets: [ ]
      # - name: "image-pull-secret"
      secrets: []
      # List of volumes that can be mounted by containers belonging to the pod.
      additionalVolumes: []
      # Pod volumes to mount into the container's filesystem.
      additionalVolumesMounts: []
    
      # Remove the above empty volumes and volumesMounts, and then set additionalVolumes and additionalVolumesMounts as below if you want to collect node exporter metrics
    # additionalVolumes:
    #   - name: hostProc
    #     hostPath:
    #       path: /proc/
    #   - name: hostSys
    #     hostPath:
    #       path: /sys/
    # additionalVolumesMounts:
    #   - mountPath: /host/sys
    #     mountPropagation: HostToContainer
    #     name: hostSys
    #     readOnly: true
    #   - mountPath: /host/proc
    #     mountPropagation: HostToContainer
    #     name: hostProc
    #     readOnly: true
    
      #Set a limit of memory that Tail plugin can use when appending data to the Engine.
      # You can find more details here: https://docs.fluentbit.io/manual/pipeline/inputs/tail#config
      #If the limit is reach, it will be paused; when the data is flushed it resumes.
      #if the inbound traffic is less than 2.4Mbps, setting memBufLimit to 5MB is enough
      #if the inbound traffic is less than 4.0Mbps, setting memBufLimit to 10MB is enough
      #if the inbound traffic is less than 13.64Mbps, setting memBufLimit to 50MB is enough
      input:
        tail:
          memBufLimit: 5MB
        nodeExporterMetrics: {}
        # uncomment below nodeExporterMetrics section if you want to collect node exporter metrics
    #   nodeExporterMetrics:
    #     tag: node_metrics
    #     scrapeInterval: 15s
    #     path:
    #       procfs: /host/proc
    #       sysfs: /host/sys
    
      #Configure the output plugin parameter in FluentBit.
      #You can set enable to true to output logs to the specified location.
      output:
    #  You can find more supported output plugins here: https://github.com/fluent/fluent-operator/tree/master/docs/plugins/fluentbit/clusteroutput
        es:
          enable: false
          host: "<Elasticsearch url like elasticsearch-logging-data.kubesphere-logging-system.svc>"
          port: 9200
          logstashPrefix: ks-logstash-log
    #      path: ""
    #      bufferSize: "4KB"
    #      index: "fluent-bit"
    #      httpUser:
    #      httpPassword:
    #      logstashFormat: true
    #      replaceDots: false
    #      enableTLS: false
    #      tls:
    #        verify: On
    #        debug: 1
    #        caFile: "<Absolute path to CA certificate file>"
    #        caPath: "<Absolute path to scan for certificate files>"
    #        crtFile: "<Absolute path to private Key file>"
    #        keyFile: "<Absolute path to private Key file>"
    #        keyPassword:
    #        vhost: "<Hostname to be used for TLS SNI extension>"
        kafka:
          enable: false
          brokers: "<kafka broker list like xxx.xxx.xxx.xxx:9092,yyy.yyy.yyy.yyy:9092>"
          topics: ks-log
        opentelemetry: {}
    # You can configure the opentelemetry-related configuration here
        opensearch: {}
    # You can configure the opensearch-related configuration here
        stdout:
          enable: true
        # forward: # {{- if .Values.Kubernetes -}} {{- if .Values.fluentd.enable -}}
        #   host: fluentd.fluent.svc.cluster.local # host: {{ .Values.fluentd.name }}.{{ .Release.Namespace }}.svc on fluentbit-output-forward.yaml
        #   port: 24224 # {{ .Values.fluentd.forward.port }}
    
      #Configure the default filters in FluentBit.
      # The `filter` will filter and parse the collected log information and output the logs into a uniform format. You can choose whether to turn this on or not.
      filter:
        kubernetes:
          enable: true
          labels: true
          annotations: true
        containerd:
      # This is customized lua containerd log format converter, you can refer here:
      # https://github.com/fluent/fluent-operator/blob/master/charts/fluent-operator/templates/fluentbit-clusterfilter-containerd.yaml
      # https://github.com/fluent/fluent-operator/blob/master/charts/fluent-operator/templates/fluentbit-containerd-config.yaml
          enable: false
        systemd:
          enable: false
    
    fluentd:
      enable: true
      name: fluentd
      port: 24224 # port: {{ .Values.fluentd.port }} on fluentd-fluentd.yaml
      image:
        repository: "kubesphere/fluentd"
        tag: "v1.15.3"
      replicas: 1
      forward:
        port: 24224 # port: {{ .Values.fluentd.forward.port }} on fluentbit-output-forward.yaml
      watchedNamespaces:
        - fluent
        - observability-system
        - default
      resources:
        limits:
          cpu: 500m
          memory: 500Mi
        requests:
          cpu: 100m
          memory: 128Mi
      # Configure the output plugin parameter in Fluentd.
      # Fluentd is disabled by default, if you enable it make sure to also set up an output to use.
      output:
        es:
          enable: false
          host: elasticsearch-logging-data.kubesphere-logging-system.svc
          port: 9200
          logstashPrefix: ks-logstash-log
          buffer:
            enable: false
            type: file
            path: /buffers/es
        kafka:
          enable: false
          brokers: "my-cluster-kafka-bootstrap.default.svc:9091,my-cluster-kafka-bootstrap.default.svc:9092,my-cluster-kafka-bootstrap.default.svc:9093"
          topicKey: kubernetes_ns
          buffer:
            enable: false
            type: file
            path: /buffers/kafka
        stdout:
          enable: true
        opensearch:
          enable: true
          host: vpc-XXXXX-us-west-2-XXXXXXXX.us-west-2.es.amazonaws.com
          port: 443
          logstashPrefix: logs
          scheme: https
          # buffer:
          #   enable: false
          #   type: file
          #   path: /buffers/opensearch
    
    nameOverride: ""
    fullnameOverride: ""
    namespaceOverride: ""
    

    Keep in mind that fluentd is running on Kubernetes cluster(EKS).

    Another issue that I had to face was that after upgrading the fluent-operator release, the changes weren't applied to the fluentd pod.

    This is because the fluentd template doesn't handle parameters like scheme.

    But the CRD does: https://github.com/fluent/helm-charts/blob/main/charts/fluent-operator/crds/fluentd.fluent.io_clusteroutputs.yaml#L1411 .

    So, I just had to apply this change manually and then kill the fluentd pod. After that, the pod recognized the changes and rendered the https scheme:

    kubectl get clusteroutput fluentd-output-opensearch -o yaml
    
    apiVersion: fluentd.fluent.io/v1alpha1
    kind: ClusterOutput
    metadata:
      annotations:
        meta.helm.sh/release-name: fluent-operator
        meta.helm.sh/release-namespace: fluent
      creationTimestamp: "2023-02-15T20:35:26Z"
      generation: 2
      labels:
        app.kubernetes.io/managed-by: Helm
        output.fluentd.fluent.io/enabled: "true"
      name: fluentd-output-opensearch
      resourceVersion: "14073767"
      uid: 9705d00f-5c10-4b32-916c-f6a487a3ac70
    spec:
      outputs:
      - opensearch:
          host: vpc-XXXXX-us-west-2-XXXXXX.us-west-2.es.amazonaws.com
          logstashFormat: true
          logstashPrefix: logs
          port: 443
          scheme: https