I'm configuring processors for the OpenTelemetry Collector, and I'm encountering some confusion regarding the recommended order of processors, and an issue particularly in relation to the k8sattributes
processor.
When I use the following order of processors in my configuration:
processors: [memory_limiter, tail_sampling, k8sattributes, attributes/insert, batch]
I'm not seeing the Kubernetes attributes in my traces.
However, when I switch the order to:
processors: [k8sattributes, memory_limiter, batch, attributes/insert, tail_sampling]
I do see the Kubernetes attributes in traces.
And when I look at the documentation here for the batch processor, it states:
It is highly recommended to configure the batch processor on every collector. The batch processor should be defined in the pipeline after the memory_limiter as well as any sampling processors. This is because batching should happen after any data drops such as sampling.
Considering this recommendation, I'm unsure on what is making the k8sattributes processor to not work. Could someone clarify the issue or tell me when am I going wrong?
Also please let me know the recommended order of processors for the OpenTelemetry Collector configuration, particularly in relation to the batch processor and any other relevant considerations.
Thank you.
Initially, I configured the processors in the following order in my YAML configuration file:
processors:
tail_sampling:
decision_wait: 15s
policies:
[
{
name: always-sample-policy,
type: always_sample
},
]
batch:
memory_limiter:
# 80% of maximum memory up to 2G
limit_mib: 1500
# 25% of limit up to 2G
spike_limit_mib: 512
check_interval: 5s
attributes/insert:
actions:
- key: "cluster"
value: ""
action: insert
k8sattributes:
auth_type: "serviceAccount"
extract:
metadata:
- k8s.namespace.name
- k8s.deployment.name
- k8s.statefulset.name
- k8s.daemonset.name
- k8s.cronjob.name
- k8s.job.name
- k8s.node.name
- k8s.pod.name
- k8s.pod.uid
- k8s.pod.start_time
passthrough: false
pod_association:
- sources:
- from: resource_attribute
name: k8s.pod.ip
- sources:
- from: resource_attribute
name: k8s.pod.uid
- sources:
- from: connection
service:
extensions: [health_check, zpages, memory_ballast]
pipelines:
traces/1:
receivers: [otlp, zipkin, jaeger]
processors: [memory_limiter, tail_sampling, k8sattributes, attributes/insert, batch]
exporters: [zipkin]
In this configuration, I expected the OpenTelemetry Collector to include Kubernetes attributes in the traces it collects.
Expectation: The Kubernetes attributes should be present in the collected traces.
Actual Result: The Kubernetes attributes were not included in the traces.
Further Experimentation: In an attempt to troubleshoot the issue, I rearranged the order of processors in the configuration file to the following:
processors: [k8sattributes, memory_limiter, batch, attributes/insert, tail_sampling]
I was able to successfully capture Kubernetes attributes in the traces.
I actually found the solution to my issue.
In the documentation on k8sattributes processor it is stated that
The connection takes the IP attribute from connection context (if available). In this case the
k8sattributes
processor must appear before any batching or tail sampling, which remove this information.
And also in the documentation of tail sampling processor it is stated that
tail sampling processor
must be placed in pipelines after any processors that rely on context, e.g.k8sattributes
. It reassembles spans into new batches, causing them to lose their original context.
So that's where I found out why I was able to see Kubernetes related attributes when k8sattributes
was placed before tail_sampling
in the processors order.
Hope this helps someone who is facing a similar issue.
Thank you.