open-telemetryopensearchopen-telemetry-collector

Otel collector gets connection error when sending data to opensearch data prepper


I am looking into opentelemetry by collecting logs and traces and sending them to an opensearch instance. I am using opensearch examples from the opensearch documantation but i seem to run into issues when the collector tries to sends data to the data prepper. I am not sure what i am doing wrong at this point.

I am using a docker compose file to start the services:

name: log-aggregation

services:
  opensearch:
    ports:
      - "9200:9200"
      - "9600:9600"
    networks:
      - log-aggregation
    image: docker.io/opensearchproject/opensearch:latest
    environment:
      "discovery.type": 'single-node'
      "plugins.security.disabled": 'true'
      OPENSEARCH_INITIAL_ADMIN_PASSWORD: <password>

  opensearch-dashboard:
    ports:
      - "5601:5601"
    networks:
      - log-aggregation
    build: ./opensearch/
    image: opensearch-dashboards-no-security
    environment:
      OPENSEARCH_HOSTS: "http://opensearch:9200"
      "server.ssl.enabled": "false"

  data-prepper:
    ports:
      - "4900:4900"
      - "21890:21890"
    networks:
      - log-aggregation
    image: docker.io/opensearchproject/data-prepper:latest
    volumes:
      - '.\opensearch\pipelines.yaml:/usr/share/data-prepper/pipelines/pipelines.yaml'
      - '.\opensearch\data-prepper-config.yaml:/usr/share/data-prepper/config/data-prepper-config.yaml'

  otel-collector:
    image: docker.io/otel/opentelemetry-collector:latest
    ports:
      - "4317:4317"
      - "4318:4318"
    networks:
      - log-aggregation
    command: ["--config=/etc/otel/config.yaml"]
    volumes:
      - "./opentelemetry/collector-config.yaml:/etc/otel/config.yaml"

networks:
  log-aggregation: {}

The dockerfile for the opensearch-dashboards-no-security image looks like this:

FROM opensearchproject/opensearch-dashboards:latest
RUN /usr/share/opensearch-dashboards/bin/opensearch-dashboards-plugin remove securityDashboards       
COPY --chown=opensearch-dashboards:opensearch-dashboards opensearch_dashboards.yml /usr/share/opensearch-dashboards/config/

I am aware that this is not suitable for production. This is just a test setup for evaluation. I can use the dashboard and the opensearch instance as expected.

I setup the otel collector using this configuration:

receivers:
  otlp:
    protocols:
      grpc:
      http:
        endpoint: 0.0.0.0:4318

processors:
  batch:

exporters:
  debug:
    verbosity: detailed
  otlp/data-prepper:
    endpoint: data-prepper:21890
    tls:
      insecure: true

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [batch]
      exporters: [debug, otlp/data-prepper]
    metrics:
      receivers: [otlp]
      processors: [batch]
      exporters: [debug, otlp/data-prepper]
    logs:
      receivers: [otlp]
      processors: [batch]
      exporters: [debug, otlp/data-prepper]
  telemetry:
    logs:
      level: "debug"

Data prepper is setup with the following config (by not specifying the authentication option data prepper should not require any as per their documentation):

ssl: false

I then setup those pipelines:

entry-pipeline:
  delay: "100"
  source:
    otel_trace_source:
      ssl: false
      authentication:
        unauthenticated:
  buffer:
    bounded_blocking:
      buffer_size: 10240
      batch_size: 160
  sink:
    - pipeline:
        name: "raw-pipeline"
    - pipeline:
        name: "service-map-pipeline"
raw-pipeline:
  source:
    pipeline:
      name: "entry-pipeline"
  buffer:
    bounded_blocking:
      buffer_size: 10240
      batch_size: 160
  processor:
    - otel_trace_raw:
  sink:
    - opensearch:
        hosts: ["http://opensearch:9200"]
        insecure: true
        index_type: trace-analytics-raw
service-map-pipeline:
  delay: "100"
  source:
    pipeline:
      name: "entry-pipeline"
  buffer:
    bounded_blocking:
      buffer_size: 10240
      batch_size: 160
  processor:
    - service_map_stateful:
  sink:
    - opensearch:
        hosts: ["http://opensearch:9200"]
        insecure: true
        index_type: trace-analytics-service-map

I can query the data prepper instance to check that the pipelines have been setup:

curl.exe localhost:4900/list
{"pipelines":[{"name":"entry-pipeline"},{"name":"service-map-pipeline"},{"name":"raw-pipeline"}]}

But when i send a test trace to the otel collector, i only get a partial success in the response:

curl.exe -i http://localhost:4318/v1/traces -H 'Content-Type: application/json' -d '@opentelemetry/span.json'  
HTTP/1.1 200 OK
Content-Type: application/json
Date: Sun, 21 Jul 2024 11:15:49 GMT
Content-Length: 21

{"partialSuccess":{}}

and the connection error received goaway and there are no active streams in the logs:

2024-07-21T11:06:49.213Z        info    zapgrpc/zapgrpc.go:176  [core] [Channel #3 SubChannel #6]Subchannel created     {"grpc_log": true}
2024-07-21T11:06:49.213Z       info    zapgrpc/zapgrpc.go:176  [core] [Channel #3]Channel Connectivity change to CONNECTING    {"grpc_log": true}
2024-07-21T11:06:49.213Z       info    zapgrpc/zapgrpc.go:176  [core] [Channel #3 SubChannel #6]Subchannel Connectivity change to CONNECTING   {"grpc_log": true}
2024-07-21T11:06:49.213Z       info    zapgrpc/zapgrpc.go:176  [core] [Channel #3 SubChannel #6]Subchannel picks a new address "10.89.1.14:21890" to connect   {"grpc_log": true}
2024-07-21T11:06:49.213Z       info    zapgrpc/zapgrpc.go:176  [pick-first-lb] [pick-first-lb 0xc00074b620] Received SubConn state update: 0xc00074b6b0, {ConnectivityState:CONNECTING ConnectionError:<nil>}  {"grpc_log": true}
2024-07-21T11:06:49.269Z       info    zapgrpc/zapgrpc.go:176  [core] [Channel #3 SubChannel #6]Subchannel Connectivity change to READY        {"grpc_log": true}
2024-07-21T11:06:49.269Z       info    zapgrpc/zapgrpc.go:176  [pick-first-lb] [pick-first-lb 0xc00074b620] Received SubConn state update: 0xc00074b6b0, {ConnectivityState:READY ConnectionError:<nil>}       {"grpc_log": true}
2024-07-21T11:06:49.269Z       info    zapgrpc/zapgrpc.go:176  [core] [Channel #3]Channel Connectivity change to READY {"grpc_log": true}
2024-07-21T11:07:04.348Z       info    zapgrpc/zapgrpc.go:176  [core] [Channel #3 SubChannel #6]Subchannel Connectivity change to IDLE {"grpc_log": true}
2024-07-21T11:07:04.348Z       info    zapgrpc/zapgrpc.go:176  [transport] [client-transport 0xc000ba2248] Closing: connection error: desc = "received goaway and there are no active streams" {"grpc_log": true}
2024-07-21T11:07:04.348Z       info    zapgrpc/zapgrpc.go:176  [transport] [client-transport 0xc000ba2248] loopyWriter exiting with error: connection error: desc = "received goaway and there are no active streams"  {"grpc_log": true}

The data prepper port seems to be correct. When i am configuring any other like 21892 (that i have seen in other questions) the collector gets a connection refused error instead.


Solution

  • It seems that the example is a bit out of date. I looked at the Data Prepper instance logs and found two warnings. Although I still get the connection error after switching processors, restarting the containers and resending the test trace to the collector, the test trace now appears in opensearch after a few seconds.

    Data Prepper Logs:

    2024-07-23T16:53:10,348 [main] WARN  org.opensearch.dataprepper.plugin.DefaultPluginFactory - Plugin name 'service_map_stateful' is deprecated and will be removed in the next major release. Consider using the updated plugin name 'service_map'.
    024-07-23T16:53:10,573 [main] WARN  org.opensearch.dataprepper.plugin.DefaultPluginFactory - Plugin name 'otel_trace_raw' is deprecated and will be removed in the next major release. Consider using the updated plugin name 'otel_traces'.
    

    pipelines.yaml:

    entry-pipeline:
      delay: "100"
      source:
        otel_trace_source:
          ssl: false
          authentication:
            unauthenticated:
      buffer:
        bounded_blocking:
          buffer_size: 10240
          batch_size: 160
      sink:
        - pipeline:
            name: "raw-pipeline"
        - pipeline:
            name: "service-map-pipeline"
    raw-pipeline:
      source:
        pipeline:
          name: "entry-pipeline"
      buffer:
        bounded_blocking:
          buffer_size: 10240
          batch_size: 160
      processor:
    -    - otel_trace_raw:
    +    - otel_traces:
      sink:
        - opensearch:
            hosts: ["http://opensearch:9200"]
            insecure: true
            index_type: trace-analytics-raw
    service-map-pipeline:
      delay: "100"
      source:
        pipeline:
          name: "entry-pipeline"
      buffer:
        bounded_blocking:
          buffer_size: 10240
          batch_size: 160
      processor:
    -    - service_map_stateful:
    +    - service_map:
      sink:
        - opensearch:
            hosts: ["http://opensearch:9200"]
            insecure: true
            index_type: trace-analytics-service-map