I am collecting OpenTelemetry (OTEL) metrics from my services, forwarding them to an OTEL Collector, and then sending them to Prometheus through the OTEL endpoint. Here is a sample OTEL Collector configuration I am using:
# Receivers
receivers:
otlp:
protocols:
http:
endpoint: 0.0.0.0:4318
auth:
authenticator: bearertokenauth
# Processors
processors:
batch: {}
# Exporters
exporters:
otlphttp/logs:
endpoint: "http://loki-write:3100/otlp"
otlphttp/metrics:
endpoint: "http://prometheus-server/api/v1/otlp"
otlphttp/traces:
endpoint: "http://tempo-distributor:4318"
# Pipelines
service:
extensions:
- health_check
- bearertokenauth
pipelines:
logs:
receivers: [otlp]
processors: [batch]
exporters: [otlphttp/logs]
metrics:
receivers: [otlp]
processors: [batch]
exporters: [otlphttp/metrics]
traces:
receivers: [otlp]
processors: [batch]
exporters: [otlphttp/traces]
Problem:
I can view most metrics in Prometheus (via Grafana), but histogram metrics are missing. For example, metrics like http_server_request_duration_sum
do not appear in Prometheus.
Troubleshooting Steps:
http_server_request_duration_sum
). The only difference is that .
in metric names is not replaced with _
.Additional Context:
Question:
What could be the reason for histogram metrics not appearing in Prometheus? Are there any specific configurations in the OTEL Collector, Prometheus, or Grafana that I need to check?
otel client ( dotnet client as well ) doesn’t use exponential histograms so first of all drop native histogram feature flag from prometheus configuration. https://github.com/open-telemetry/opentelemetry-dotnet/blob/main/docs/metrics/customizing-the-sdk/README.md#configuring-the-aggregation-of-a-histogram
otel client sends explicit bucket histogram which can only be converted to classical prometheus histograms https://www.prometheus.io/docs/specs/native_histograms/#otlp
OTEL_METRIC_EXPORT_INTERVAL in otel client sdk is 60 sec by default while as garafna expects it to be 15 sec. We have two choices either set it to 15 sec in otel client as mentioned here https://prometheus.io/docs/guides/opentelemetry/ or change scrape interval to 60 sec in grafana as mentioned here https://grafana.com/blog/2020/09/28/new-in-grafana-7.2-__rate_interval-for-prometheus-rate-queries-that-just-work/