open-telemetryopen-telemetry-collectorotel

Otel Collector, TailSampling and SpanName


I'm trying to use OTEL Collector to filter out Traces from being emitted to my store.

I am using otel/opentelemetry-collector-contrib:latest, with tail_sampling

I have an event which runs about 1200 per second, I want to remove 90% of successful traces where the name is for this specific event. I want to keep all traces for this name which are either error state themselves, or contain a span which is in an error state.

I do not want to alter the sampling of any other traces emitted by my application.

I want the simplest possible configuration to achieve this.

I have this otel-collector-config.yaml, but am having trouble to get it to:

  1. Correctly filter on Span Name.
  2. Correctly keep all other Spans.
  3. Filter out 90% of Traces/Spans for my matching events:
processors:

  batch:
    send_batch_size: 10000
    timeout: 1s

  tail_sampling:
    decision_wait: 30s
    num_traces: 50000
    policies:
      [
        {
          name: downsample-event,
          type: and,
          and: {
            and_sub_policy:
             [
              {
                name: name_filter, type: ottl_condition,
                ottl_condition: { error_mode: ignore, span: [ "name == \"MyEvent\"" ] }
              },
              { name: is_ok, type: status_code, status_code: { status_codes: [OK] } },
              { name: prob, type: probabilistic, probabilistic: { sampling_percentage: 10 } }
             ]
           }
         },
         {
           name: capture-everthing-else,
           type: always_sample
         },
       ]

My intention with this tail_sampling configuration is as follows:

  1. Spans matching the name 'MyEvent', with a Status Code of OK, should be sampled at 10%.
  2. Every other span should be captured.

Current holes:

  1. I cannot filter correctly on Name for some reason?
  2. I am not sure that this will enable the full 'tree' of Spans, where a sub-span of MyEvent contains an error.

ChatGPT is giving me extremely unhelpful suggestions for this configuration, and so I've come to ask the experts.


Solution

  • It is not currently possible using the tools currently available in tail sampling in the Otel Collector in Dec 2024.

    This is because, you cannot filter out a specific NAME of a SPAN in a Trace.

    Why? Because the match happens for ANY span in a Trace, ANY child span not named 'MyEvent' INSIDE the 'MyEvent' root level span/trace, WILL BE MATCHED. And this causes us to capture all traces again.