i am following this document to set up the distributed tracing : https://docs.oracle.com/en-us/iaas/Content/ContEng/Tasks/contengistio-intro-topic.htm#exploring_istio_observability
My Cluster is on GKE GCP for testing purposes, installed istio top of it and followed document and setup services.
Services are up and running with Prometheus, Grafana, Jeger & Zipkin.
It's failing from step : Performing Distributed Tracing with OCI Application Performance Monitoring.
Tried udpating configmap for sidecar injector so that i can push tracing details to zipkin domain.
Configured Zipkin domain and using public-span
use of now in configmap.
apiVersion: v1
kind: ConfigMap
metadata:
name: istio-custom-bootstrap-config
namespace: default
data:
custom_bootstrap.json: |
{
"tracing": {
"http": {
"name": "envoy.tracers.zipkin",
"typed_config": {
"@type": "type.googleapis.com/envoy.config.trace.v3.ZipkinConfig",
"collector_cluster": "aaaabbbb.apm-agt.us-ashburn-1.oci.oraclecloud.com", // [Replace this with data upload endpoint of your apm domain]
"collector_endpoint": "/20200101/observations/private-span?dataFormat=zipkin&dataFormatVersion=2&dataKey=2C6YOLQSUZ5Q7IGN", // [Replace with the private datakey of your apm domain. You can also use public datakey but change the observation type to public-span]
"collectorEndpointVersion": "HTTP_JSON",
"trace_id_128bit": true,
"shared_span_context": false
}
}
},
"static_resources": {
"clusters": [{
"name": "aaaabbbb.apm-agt.us-ashburn-1.oci.oraclecloud.com", // [Replace this with data upload endpoint of your apm domain:443]
"type": "STRICT_DNS",
"lb_policy": "ROUND_ROBIN",
"load_assignment": {
"cluster_name": "aaaabbbb.apm-agt.us-ashburn-1.oci.oraclecloud.com", // [Replace this with data upload endpoint of your apm domain]
"endpoints": [{
"lb_endpoints": [{
"endpoint": {
"address": {
"socket_address": {
"address": "aaaabbbb.apm-agt.us-ashburn-1.oci.oraclecloud.com", // [Replace this with data upload endpoint of your apm domain]
"port_value": 443
}
}
}
}]
}]
},
"transport_socket": {
"name": "envoy.transport_sockets.tls",
"typed_config": {
"@type": "type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.UpstreamTlsContext",
"sni": "aaaabbbb.apm-agt.us-ashburn-1.oci.oraclecloud.com" // [Replace this with data upload endpoint of your apm domain]
}
}
}]
}
}
above configmap not working as expected, the sidecar is crashing due to missing key connection_timeout
although after adding in configmap sidecar not showing error.
There is no error found in Zipkin
or Istiod
containers, not sure how to debug further.
Error log :
2022-03-30T05:59:33.146580Z info FLAG: --concurrency="2"
2022-03-30T05:59:33.146632Z info FLAG: --domain="default.svc.cluster.local"
2022-03-30T05:59:33.146642Z info FLAG: --help="false"
2022-03-30T05:59:33.146648Z info FLAG: --log_as_json="false"
2022-03-30T05:59:33.146672Z info FLAG: --log_caller=""
2022-03-30T05:59:33.146678Z info FLAG: --log_output_level="default:info"
2022-03-30T05:59:33.146682Z info FLAG: --log_rotate=""
2022-03-30T05:59:33.146687Z info FLAG: --log_rotate_max_age="30"
2022-03-30T05:59:33.146693Z info FLAG: --log_rotate_max_backups="1000"
2022-03-30T05:59:33.146699Z info FLAG: --log_rotate_max_size="104857600"
2022-03-30T05:59:33.146704Z info FLAG: --log_stacktrace_level="default:none"
2022-03-30T05:59:33.146715Z info FLAG: --log_target="[stdout]"
2022-03-30T05:59:33.146725Z info FLAG: --meshConfig="./etc/istio/config/mesh"
2022-03-30T05:59:33.146730Z info FLAG: --outlierLogPath=""
2022-03-30T05:59:33.146736Z info FLAG: --proxyComponentLogLevel="misc:error"
2022-03-30T05:59:33.146741Z info FLAG: --proxyLogLevel="warning"
2022-03-30T05:59:33.146747Z info FLAG: --serviceCluster="reviews.default"
2022-03-30T05:59:33.146753Z info FLAG: --stsPort="0"
2022-03-30T05:59:33.146760Z info FLAG: --templateFile=""
2022-03-30T05:59:33.146767Z info FLAG: --tokenManagerPlugin="GoogleTokenExchange"
2022-03-30T05:59:33.146784Z info Version 1.8.0-c87a4c874df27e37a3e6c25fa3d1ef6279685d23-Clean
2022-03-30T05:59:33.146991Z info Obtained private IP [10.4.1.6]
2022-03-30T05:59:33.147107Z info Apply proxy config from env {"tracing":{"zipkin":{"address":"caadc76wvdp7edddddddccclii.apm-agt.ap-mumbai-1.oci.oraclecloud.com:443"}},"proxyMetadata":{"DNS_AGENT":""}}
2022-03-30T05:59:33.148650Z info Effective config: binaryPath: /usr/local/bin/envoy
concurrency: 2
configPath: ./etc/istio/proxy
controlPlaneAuthPolicy: MUTUAL_TLS
discoveryAddress: istiod.istio-system.svc:15012
drainDuration: 45s
envoyAccessLogService: {}
envoyMetricsService: {}
parentShutdownDuration: 60s
proxyAdminPort: 15000
proxyMetadata:
DNS_AGENT: ""
serviceCluster: reviews.default
statNameLength: 189
statusPort: 15020
terminationDrainDuration: 5s
tracing:
zipkin:
address: caadc76wvdp7edddddddccclii.apm-agt.ap-mumbai-1.oci.oraclecloud.com:443
2022-03-30T05:59:33.148721Z info Proxy role: &model.Proxy{RWMutex:sync.RWMutex{w:sync.Mutex{state:0, sema:0x0}, writerSem:0x0, readerSem:0x0, readerCount:0, readerWait:0}, Type:"sidecar", IPAddresses:[]string{"10.4.1.6"}, ID:"reviews-v1-5d6559df86-qbg6b.default", Locality:(*envoy_config_core_v3.Locality)(nil), DNSDomain:"default.svc.cluster.local", ConfigNamespace:"", Metadata:(*model.NodeMetadata)(nil), SidecarScope:(*model.SidecarScope)(nil), PrevSidecarScope:(*model.SidecarScope)(nil), MergedGateway:(*model.MergedGateway)(nil), ServiceInstances:[]*model.ServiceInstance(nil), IstioVersion:(*model.IstioVersion)(nil), VerifiedIdentity:(*spiffe.Identity)(nil), ipv6Support:false, ipv4Support:false, GlobalUnicastIP:"", XdsResourceGenerator:model.XdsResourceGenerator(nil), WatchedResources:map[string]*model.WatchedResource(nil)}
2022-03-30T05:59:33.148732Z info JWT policy is third-party-jwt
2022-03-30T05:59:33.148777Z info PilotSAN []string{"istiod.istio-system.svc"}
2022-03-30T05:59:33.148827Z info sa.serverOptions.CAEndpoint == istiod.istio-system.svc:15012 Citadel
2022-03-30T05:59:33.148916Z info Using CA istiod.istio-system.svc:15012 cert with certs: var/run/secrets/istio/root-cert.pem
2022-03-30T05:59:33.149082Z info citadelclient Citadel client using custom root: istiod.istio-system.svc:15012 -----BEGIN CERTIFICATE-----
MIIC/DCCAeSgAwIBAgIQOzOVPb98v+UHCpf80MI1pTANBgkqhkiG9w0BAQsFADAY
MRYwFAYDVQQKEw1jbHVzdGVyLmxvY2FsMB4XDTIyMDMyOTE3MzIyOFoXDTMyMDMy
NjE3MzIyOFowGDEWMBQGA1UEChMNY2x1c3Rlci5sb2NhbDCCASIwDQYJKoZIhvcN
AQEBBQADggEPADCCAQoCggEBAO4j6Sa5VoFCUctY/ehMsFfXjejVHE05PzgaTt0x
zGK6WDLd4bQHVxiEERs2bQcPYP55T+AqBo4cyU5BFi7gEvrVdfHDMGdl4f3rhojB
RNdPLw9axyBNulOYBGIOIthpYY45fPLqvADQmU6GIUqcpg83zuwiyufbaCuElVuJ
h3eMebBQL6zsm+4BFZOTECvjMMpH/HSjOKdW/XsUU71FSVPo9q6devzLgCquZemO
kWHGjTtibwPcyRTZiL9FgBMnFF5gXe5K8FauIQlgkTDTWPj99n2FPGrfgEEC+z3q
O12NYi41zdY9RTk7f6kFHTzLRcGQ8ItG9MRebfZSfDqudCsCAwEAAaNCMEAwDgYD
VR0PAQH/BAQDAgIEMA8GA1UdEwEB/wQFMAMBAf8wHQYDVR0OBBYEFKAN5Ltn7oIN
l+9yoTfvOIvhBdTCMA0GCSqGSIb3DQEBCwUAA4IBAQCOKu1XEvJKXwRR/VNaL19L
iTIsC5csW4Dg1Z8aFQk+1UwroBbsdjCkPiwK0FJKHMoobIOtSbjn9k+OaUfv4pZo
D8dsDznqGJpkkiZ7zviwmpS3+B2YHoKFRs0ZXHu4hC081AUFjfFvFcwjtfPYKSGU
KqtxKPuvXCVGqaPdmkg5J4gG5q+Yutxno4m3VxGVocuHzXI9/Kox2Lz0C3royfF7
XoTxNy08TzkjDPuPCLqYy85zFOM7PzuuuK7ZIkdXpKbStIWLbjkciqLPzwi18JaH
eyS1/hORUC7AKMj8a3fKWrFsRiMu4Mdv+knnQ1ntLqb5Vy85VTvNFAvAB7mwD/NN
-----END CERTIFICATE-----
2022-03-30T05:59:33.219251Z info sds SDS gRPC server for workload UDS starts, listening on "./etc/istio/proxy/SDS"
2022-03-30T05:59:33.219548Z info xdsproxy Initializing with upstream address istiod.istio-system.svc:15012 and cluster Kubernetes
2022-03-30T05:59:33.219346Z info sds Start SDS grpc server
2022-03-30T05:59:33.220303Z info xdsproxy adding watcher for certificate var/run/secrets/istio/root-cert.pem
2022-03-30T05:59:33.220894Z info Starting proxy agent
2022-03-30T05:59:33.222017Z info Opening status port 15020
2022-03-30T05:59:33.222278Z info Received new config, creating new Envoy epoch 0
2022-03-30T05:59:33.222328Z info Epoch 0 starting
2022-03-30T05:59:33.239683Z info Envoy command: [-c etc/istio/proxy/envoy-rev0.json --restart-epoch 0 --drain-time-s 45 --parent-shutdown-time-s 60 --service-cluster reviews.default --service-node sidecar~10.4.1.6~reviews-v1-5d6559df86-qbg6b.default~default.svc.cluster.local --local-address-ip-version v4 --bootstrap-version 3 --log-format-prefix-with-location 0 --log-format %Y-%m-%dT%T.%fZ %l envoy %n %v -l warning --component-log-level misc:error --config-yaml {
"tracing": {
"http": {
"name": "envoy.tracers.zipkin",
"typed_config": {
"@type": "type.googleapis.com/envoy.config.trace.v3.ZipkinConfig",
"collector_cluster": "aaaabbbb.apm-agt.us-ashburn-1.oci.oraclecloud.com",
"collector_endpoint": "/20200101/observations/public-span?dataFormat=zipkin&dataFormatVersion=2&dataKey=MAYH36IJELZRXTEETKL7QEA7NPA5UNEI",
"collectorEndpointVersion": "HTTP_JSON",
"trace_id_128bit": true,
"shared_span_context": false
}
}
},
"static_resources": {
"clusters": [{
"name": "aaaabbbb.apm-agt.us-ashburn-1.oci.oraclecloud.com:443",
"connect_timeout": "5s",
"type": "STRICT_DNS",
"lb_policy": "ROUND_ROBIN",
"load_assignment": {
"cluster_name": "aaaabbbb.apm-agt.us-ashburn-1.oci.oraclecloud.com",
"endpoints": [{
"lb_endpoints": [{
"endpoint": {
"address": {
"socket_address": {
"address": "aaaabbbb.apm-agt.us-ashburn-1.oci.oraclecloud.com",
"port_value": 443
}
}
}
}]
}]
},
"transport_socket": {
"name": "envoy.transport_sockets.tls",
"typed_config": {
"@type": "type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.UpstreamTlsContext",
"sni": "aaaabbbb.apm-agt.us-ashburn-1.oci.oraclecloud.com"
}
}
}]
}
}
--concurrency 2]
2022-03-30T05:59:33.315619Z warning envoy runtime Unable to use runtime singleton for feature envoy.http.headermap.lazy_map_min_size
2022-03-30T05:59:33.315693Z warning envoy runtime Unable to use runtime singleton for feature envoy.http.headermap.lazy_map_min_size
2022-03-30T05:59:33.316469Z warning envoy runtime Unable to use runtime singleton for feature envoy.http.headermap.lazy_map_min_size
2022-03-30T05:59:33.316542Z warning envoy runtime Unable to use runtime singleton for feature envoy.http.headermap.lazy_map_min_size
2022-03-30T05:59:33.390651Z info xdsproxy Envoy ADS stream established
2022-03-30T05:59:33.391110Z info xdsproxy connecting to upstream XDS server: istiod.istio-system.svc:15012
2022-03-30T05:59:33.396461Z warning envoy main there is no configured limit to the number of allowed active connections. Set a limit via the runtime key overload.global_downstream_max_connections
2022-03-30T05:59:33.478768Z info sds resource:ROOTCA new connection
2022-03-30T05:59:33.479543Z info sds Skipping waiting for gateway secret
2022-03-30T05:59:33.479419Z info sds resource:default new connection
2022-03-30T05:59:33.479917Z info sds Skipping waiting for gateway secret
2022-03-30T05:59:33.682346Z info cache Root cert has changed, start rotating root cert for SDS clients
2022-03-30T05:59:33.682714Z info cache GenerateSecret default
2022-03-30T05:59:33.683386Z info sds resource:default pushed key/cert pair to proxy
2022-03-30T05:59:34.079948Z info cache Loaded root cert from certificate ROOTCA
2022-03-30T05:59:34.080300Z info sds resource:ROOTCA pushed root cert to proxy
2022-03-30T05:59:34.154971Z warning envoy config gRPC config for type.googleapis.com/envoy.config.listener.v3.Listener rejected: Error adding/updating listener(s) 10.8.14.87_14250: envoy.tracers.zipkin: unknown cluster 'aaaabbbb.apm-agt.us-ashburn-1.oci.oraclecloud.com'
0.0.0.0_20001: envoy.tracers.zipkin: unknown cluster 'aaaabbbb.apm-agt.us-ashburn-1.oci.oraclecloud.com'
10.8.1.76_3000: envoy.tracers.zipkin: unknown cluster 'aaaabbbb.apm-agt.us-ashburn-1.oci.oraclecloud.com'
0.0.0.0_9411: envoy.tracers.zipkin: unknown cluster 'aaaabbbb.apm-agt.us-ashburn-1.oci.oraclecloud.com'
10.8.1.191_15021: envoy.tracers.zipkin: unknown cluster 'aaaabbbb.apm-agt.us-ashburn-1.oci.oraclecloud.com'
0.0.0.0_9080: envoy.tracers.zipkin: unknown cluster 'aaaabbbb.apm-agt.us-ashburn-1.oci.oraclecloud.com'
10.8.5.43_443: envoy.tracers.zipkin: unknown cluster 'aaaabbbb.apm-agt.us-ashburn-1.oci.oraclecloud.com'
0.0.0.0_15010: envoy.tracers.zipkin: unknown cluster 'aaaabbbb.apm-agt.us-ashburn-1.oci.oraclecloud.com'
0.0.0.0_15014: envoy.tracers.zipkin: unknown cluster 'aaaabbbb.apm-agt.us-ashburn-1.oci.oraclecloud.com'
10.8.14.87_14268: envoy.tracers.zipkin: unknown cluster 'aaaabbbb.apm-agt.us-ashburn-1.oci.oraclecloud.com'
0.0.0.0_80: envoy.tracers.zipkin: unknown cluster 'aaaabbbb.apm-agt.us-ashburn-1.oci.oraclecloud.com'
0.0.0.0_9090: envoy.tracers.zipkin: unknown cluster 'aaaabbbb.apm-agt.us-ashburn-1.oci.oraclecloud.com'
virtualInbound: envoy.tracers.zipkin: unknown cluster 'aaaabbbb.apm-agt.us-ashburn-1.oci.oraclecloud.com'
2022-03-30T05:59:35.844720Z warn Envoy proxy is NOT ready: config not received from Pilot (is Pilot running?): cds updates: 1 successful, 0 rejected; lds updates: 0 successful, 1 reje
After 2-3 days of debugging was able to resolve distributed tracing issue with istio, Zipkin and OCI APM.
Note : With root user it was not working, so I created one compartment in OCI created IAM policy, group and give full access of compartment to the group.
Added root user to group and weirdly it started working while with direct root user and default policy it was not working.
Ref doc for policy : https://docs-uat.us.oracle.com/en/cloud/paas/application-performance-monitoring/apmgn/perform-oracle-cloud-infrastructure-prerequisite-tasks.html
Working configmap sidecar
connect_timeout
key is required otherwise sidecar is failing and due to that PODs won't come in Ready state. Port 443 mentioned in the official documentation is not required.
apiVersion: v1
kind: ConfigMap
metadata:
name: istio-custom-bootstrap-config
data:
custom_bootstrap.json: |
{
"tracing": {
"http": {
"name": "envoy.tracers.zipkin",
"typed_config": {
"@type": "type.googleapis.com/envoy.config.trace.v3.ZipkinConfig",
"collector_cluster": "aacytncaaaaaaaal2a.apm-agt.ap-mumbai-1.oci.oraclecloud.com",
"collector_endpoint": "/20200101/observations/public-span?dataFormat=zipkin&dataFormatVersion=2&dataKey=M7SOSHXXXXXXXXXXXXXXXXXXXUZEHOGRSA",
"collector_endpoint_version": "HTTP_JSON",
"trace_id_128bit": true,
"shared_span_context": false
}
}
},
"static_resources": {
"clusters": [{
"name": "aacytncaaaaaaaal2a.apm-agt.ap-mumbai-1.oci.oraclecloud.com",
"type": "STRICT_DNS",
"connect_timeout": "5s",
"lb_policy": "ROUND_ROBIN",
"load_assignment": {
"cluster_name": "aacytncaaaaaaaal2a.apm-agt.ap-mumbai-1.oci.oraclecloud.com",
"endpoints": [{
"lb_endpoints": [{
"endpoint": {
"address": {
"socket_address": {
"address": "aacytncaaaaaaaal2a.apm-agt.ap-mumbai-1.oci.oraclecloud.com",
"port_value": 443
}
}
}
}]
}]
},
"transport_socket": {
"name": "envoy.transport_sockets.tls",
"typed_config": {
"@type": "type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.UpstreamTlsContext",
"sni": "aacytncaaaaaaaal2a.apm-agt.ap-mumbai-1.oci.oraclecloud.com"
}
}
}]
}
}
Istio config
sampling: 100
will push mostly all traces to Zipkin and OCI APM domain. Also i enabled enableTracing: true
Read more at : https://istio.io/latest/docs/tasks/observability/distributed-tracing/mesh-and-proxy-config/
data:
mesh: |-
accessLogFile: /dev/stdout
enableTracing: true
defaultConfig:
discoveryAddress: istiod.istio-system.svc:15012
proxyMetadata: {}
tracing:
sampling: 100
zipkin:
address: aacytncaaaaaaaal2a.apm-agt.ap-mumbai-1.oci.oraclecloud.com:443
enablePrometheusMerge: true
rootNamespace: istio-system
outboundTrafficPolicy:
mode: ALLOW_ANY
trustDomain: cluster.local
meshNetworks: 'networks: {}'
OCI console