I am looking into opentelemetry by collecting logs and traces and sending them to an opensearch instance. I am using opensearch examples from the opensearch documantation but i seem to run into issues when the collector tries to sends data to the data prepper. I am not sure what i am doing wrong at this point.
I am using a docker compose file to start the services:
name: log-aggregation
services:
opensearch:
ports:
- "9200:9200"
- "9600:9600"
networks:
- log-aggregation
image: docker.io/opensearchproject/opensearch:latest
environment:
"discovery.type": 'single-node'
"plugins.security.disabled": 'true'
OPENSEARCH_INITIAL_ADMIN_PASSWORD: <password>
opensearch-dashboard:
ports:
- "5601:5601"
networks:
- log-aggregation
build: ./opensearch/
image: opensearch-dashboards-no-security
environment:
OPENSEARCH_HOSTS: "http://opensearch:9200"
"server.ssl.enabled": "false"
data-prepper:
ports:
- "4900:4900"
- "21890:21890"
networks:
- log-aggregation
image: docker.io/opensearchproject/data-prepper:latest
volumes:
- '.\opensearch\pipelines.yaml:/usr/share/data-prepper/pipelines/pipelines.yaml'
- '.\opensearch\data-prepper-config.yaml:/usr/share/data-prepper/config/data-prepper-config.yaml'
otel-collector:
image: docker.io/otel/opentelemetry-collector:latest
ports:
- "4317:4317"
- "4318:4318"
networks:
- log-aggregation
command: ["--config=/etc/otel/config.yaml"]
volumes:
- "./opentelemetry/collector-config.yaml:/etc/otel/config.yaml"
networks:
log-aggregation: {}
The dockerfile for the opensearch-dashboards-no-security
image looks like this:
FROM opensearchproject/opensearch-dashboards:latest
RUN /usr/share/opensearch-dashboards/bin/opensearch-dashboards-plugin remove securityDashboards
COPY --chown=opensearch-dashboards:opensearch-dashboards opensearch_dashboards.yml /usr/share/opensearch-dashboards/config/
I am aware that this is not suitable for production. This is just a test setup for evaluation. I can use the dashboard and the opensearch instance as expected.
I setup the otel collector using this configuration:
receivers:
otlp:
protocols:
grpc:
http:
endpoint: 0.0.0.0:4318
processors:
batch:
exporters:
debug:
verbosity: detailed
otlp/data-prepper:
endpoint: data-prepper:21890
tls:
insecure: true
service:
pipelines:
traces:
receivers: [otlp]
processors: [batch]
exporters: [debug, otlp/data-prepper]
metrics:
receivers: [otlp]
processors: [batch]
exporters: [debug, otlp/data-prepper]
logs:
receivers: [otlp]
processors: [batch]
exporters: [debug, otlp/data-prepper]
telemetry:
logs:
level: "debug"
Data prepper is setup with the following config (by not specifying the authentication
option data prepper should not require any as per their documentation):
ssl: false
I then setup those pipelines:
entry-pipeline:
delay: "100"
source:
otel_trace_source:
ssl: false
authentication:
unauthenticated:
buffer:
bounded_blocking:
buffer_size: 10240
batch_size: 160
sink:
- pipeline:
name: "raw-pipeline"
- pipeline:
name: "service-map-pipeline"
raw-pipeline:
source:
pipeline:
name: "entry-pipeline"
buffer:
bounded_blocking:
buffer_size: 10240
batch_size: 160
processor:
- otel_trace_raw:
sink:
- opensearch:
hosts: ["http://opensearch:9200"]
insecure: true
index_type: trace-analytics-raw
service-map-pipeline:
delay: "100"
source:
pipeline:
name: "entry-pipeline"
buffer:
bounded_blocking:
buffer_size: 10240
batch_size: 160
processor:
- service_map_stateful:
sink:
- opensearch:
hosts: ["http://opensearch:9200"]
insecure: true
index_type: trace-analytics-service-map
I can query the data prepper instance to check that the pipelines have been setup:
curl.exe localhost:4900/list
{"pipelines":[{"name":"entry-pipeline"},{"name":"service-map-pipeline"},{"name":"raw-pipeline"}]}
But when i send a test trace to the otel collector, i only get a partial success in the response:
curl.exe -i http://localhost:4318/v1/traces -H 'Content-Type: application/json' -d '@opentelemetry/span.json'
HTTP/1.1 200 OK
Content-Type: application/json
Date: Sun, 21 Jul 2024 11:15:49 GMT
Content-Length: 21
{"partialSuccess":{}}
and the connection error received goaway and there are no active streams
in the logs:
2024-07-21T11:06:49.213Z info zapgrpc/zapgrpc.go:176 [core] [Channel #3 SubChannel #6]Subchannel created {"grpc_log": true}
2024-07-21T11:06:49.213Z info zapgrpc/zapgrpc.go:176 [core] [Channel #3]Channel Connectivity change to CONNECTING {"grpc_log": true}
2024-07-21T11:06:49.213Z info zapgrpc/zapgrpc.go:176 [core] [Channel #3 SubChannel #6]Subchannel Connectivity change to CONNECTING {"grpc_log": true}
2024-07-21T11:06:49.213Z info zapgrpc/zapgrpc.go:176 [core] [Channel #3 SubChannel #6]Subchannel picks a new address "10.89.1.14:21890" to connect {"grpc_log": true}
2024-07-21T11:06:49.213Z info zapgrpc/zapgrpc.go:176 [pick-first-lb] [pick-first-lb 0xc00074b620] Received SubConn state update: 0xc00074b6b0, {ConnectivityState:CONNECTING ConnectionError:<nil>} {"grpc_log": true}
2024-07-21T11:06:49.269Z info zapgrpc/zapgrpc.go:176 [core] [Channel #3 SubChannel #6]Subchannel Connectivity change to READY {"grpc_log": true}
2024-07-21T11:06:49.269Z info zapgrpc/zapgrpc.go:176 [pick-first-lb] [pick-first-lb 0xc00074b620] Received SubConn state update: 0xc00074b6b0, {ConnectivityState:READY ConnectionError:<nil>} {"grpc_log": true}
2024-07-21T11:06:49.269Z info zapgrpc/zapgrpc.go:176 [core] [Channel #3]Channel Connectivity change to READY {"grpc_log": true}
2024-07-21T11:07:04.348Z info zapgrpc/zapgrpc.go:176 [core] [Channel #3 SubChannel #6]Subchannel Connectivity change to IDLE {"grpc_log": true}
2024-07-21T11:07:04.348Z info zapgrpc/zapgrpc.go:176 [transport] [client-transport 0xc000ba2248] Closing: connection error: desc = "received goaway and there are no active streams" {"grpc_log": true}
2024-07-21T11:07:04.348Z info zapgrpc/zapgrpc.go:176 [transport] [client-transport 0xc000ba2248] loopyWriter exiting with error: connection error: desc = "received goaway and there are no active streams" {"grpc_log": true}
The data prepper port seems to be correct. When i am configuring any other like 21892 (that i have seen in other questions) the collector gets a connection refused
error instead.
It seems that the example is a bit out of date. I looked at the Data Prepper instance logs and found two warnings. Although I still get the connection error after switching processors, restarting the containers and resending the test trace to the collector, the test trace now appears in opensearch after a few seconds.
Data Prepper Logs:
2024-07-23T16:53:10,348 [main] WARN org.opensearch.dataprepper.plugin.DefaultPluginFactory - Plugin name 'service_map_stateful' is deprecated and will be removed in the next major release. Consider using the updated plugin name 'service_map'.
024-07-23T16:53:10,573 [main] WARN org.opensearch.dataprepper.plugin.DefaultPluginFactory - Plugin name 'otel_trace_raw' is deprecated and will be removed in the next major release. Consider using the updated plugin name 'otel_traces'.
pipelines.yaml:
entry-pipeline:
delay: "100"
source:
otel_trace_source:
ssl: false
authentication:
unauthenticated:
buffer:
bounded_blocking:
buffer_size: 10240
batch_size: 160
sink:
- pipeline:
name: "raw-pipeline"
- pipeline:
name: "service-map-pipeline"
raw-pipeline:
source:
pipeline:
name: "entry-pipeline"
buffer:
bounded_blocking:
buffer_size: 10240
batch_size: 160
processor:
- - otel_trace_raw:
+ - otel_traces:
sink:
- opensearch:
hosts: ["http://opensearch:9200"]
insecure: true
index_type: trace-analytics-raw
service-map-pipeline:
delay: "100"
source:
pipeline:
name: "entry-pipeline"
buffer:
bounded_blocking:
buffer_size: 10240
batch_size: 160
processor:
- - service_map_stateful:
+ - service_map:
sink:
- opensearch:
hosts: ["http://opensearch:9200"]
insecure: true
index_type: trace-analytics-service-map