kubernetesk3sdgraphk3d

Failed to install Dgraph. Got "Error while dialing dial tcp: lookup dgraph-zero-0.dgraph-zero.hm.svc.cluster.local: no such host"


I am trying to install Dgraph. Here is what I did:

I created a dev cluster by k3d by

k3d cluster create dev --config=dev-cluster-config.yaml

dev-cluster-config.yaml file:

apiVersion: k3d.io/v1alpha2
kind: Simple
kubeAPI:
  hostPort: "6440"
network: hm-network
ports:
  - port: 40000:80
    nodeFilters:
      - loadbalancer
options:
  k3s:
    extraServerArgs:
      - --no-deploy=traefik
      - --cluster-domain=dev.k8s-hongbomiao.com

I installed Dgraph by

kubectl create namespace hm
kubectl apply --namespace=hm --filename=https://raw.githubusercontent.com/dgraph-io/dgraph/master/contrib/config/kubernetes/dgraph-ha/dgraph-ha.yaml

As you see my dgraph-alpha-0 has issue. enter image description here

Here is my dgraph-alpha-0 log:

++ hostname -f
+ dgraph alpha --my=dgraph-alpha-0.dgraph-alpha.hm.svc.dev.k8s-hongbomiao.com:7080 --zero dgraph-zero-0.dgraph-zero.hm.svc.cluster.local:5080,dgraph-zero-1.dgraph-zero.hm.svc.cluster.local:5080,dgraph-zero-2.dgraph-zero.hm.svc.cluster.local:5080
[Sentry] 2021/08/05 19:16:15 Integration installed: ContextifyFrames
[Sentry] 2021/08/05 19:16:15 Integration installed: Environment
[Sentry] 2021/08/05 19:16:15 Integration installed: Modules
[Sentry] 2021/08/05 19:16:15 Integration installed: IgnoreErrors
[Sentry] 2021/08/05 19:16:16 Integration installed: ContextifyFrames
[Sentry] 2021/08/05 19:16:16 Integration installed: Environment
[Sentry] 2021/08/05 19:16:16 Integration installed: Modules
[Sentry] 2021/08/05 19:16:16 Integration installed: IgnoreErrors
I0805 19:16:16.205382      19 sentry_integration.go:48] This instance of Dgraph will send anonymous reports of panics back to Dgraph Labs via Sentry. No confidential information is sent. These reports help improve Dgraph. To opt-out, restart your instance with the --telemetry "sentry=false;" flag. For more info, see https://dgraph.io/docs/howto/#data-handling.
I0805 19:16:16.396706      19 init.go:110] 

Dgraph version   : v21.03.1
Dgraph codename  : rocket-1
Dgraph SHA-256   : a00b73d583a720aa787171e43b4cb4dbbf75b38e522f66c9943ab2f0263007fe
Commit SHA-1     : ea1cb5f35
Commit timestamp : 2021-06-17 20:38:11 +0530
Branch           : HEAD
Go version       : go1.16.2
jemalloc enabled : true

For Dgraph official documentation, visit https://dgraph.io/docs.
For discussions about Dgraph     , visit https://discuss.dgraph.io.
For fully-managed Dgraph Cloud   , visit https://dgraph.io/cloud.

Licensed variously under the Apache Public License 2.0 and Dgraph Community License.
Copyright 2015-2021 Dgraph Labs, Inc.


I0805 19:16:16.396764      19 run.go:752] x.Config: {PortOffset:0 Limit:disallow-drop=false; txn-abort-after=5m; max-pending-queries=10000; query-edge=1000000; mutations-nquad=1000000; query-timeout=0ms; max-retries=-1; mutations=allow; normalize-node=10000 LimitMutationsNquad:1000000 LimitQueryEdge:1000000 BlockClusterWideDrop:false LimitNormalizeNode:10000 QueryTimeout:0s MaxRetries:-1 GraphQL:introspection=true; debug=false; extensions=true; poll-interval=1s; lambda-url= GraphQLDebug:false}
I0805 19:16:16.396828      19 run.go:753] x.WorkerConfig: {TmpDir:t ExportPath:export Trace:ratio=0.01; jaeger=; datadog= MyAddr:dgraph-alpha-0.dgraph-alpha.hm.svc.dev.k8s-hongbomiao.com:7080 ZeroAddr:[dgraph-zero-0.dgraph-zero.hm.svc.cluster.local:5080 dgraph-zero-1.dgraph-zero.hm.svc.cluster.local:5080 dgraph-zero-2.dgraph-zero.hm.svc.cluster.local:5080] TLSClientConfig:<nil> TLSServerConfig:<nil> Raft:learner=false; snapshot-after-entries=10000; snapshot-after-duration=30m; pending-proposals=256; idx=; group= Badger:{Dir: ValueDir: SyncWrites:false NumVersionsToKeep:1 ReadOnly:false Logger:0xc0001cab50 Compression:1 InMemory:false MetricsEnabled:true NumGoroutines:8 MemTableSize:67108864 BaseTableSize:2097152 BaseLevelSize:10485760 LevelSizeMultiplier:10 TableSizeMultiplier:2 MaxLevels:7 VLogPercentile:0 ValueThreshold:1048576 NumMemtables:5 BlockSize:4096 BloomFalsePositive:0.01 BlockCacheSize:697932185 IndexCacheSize:375809638 NumLevelZeroTables:5 NumLevelZeroTablesStall:15 ValueLogFileSize:1073741823 ValueLogMaxEntries:1000000 NumCompactors:4 CompactL0OnClose:false LmaxCompaction:false ZSTDCompressionLevel:0 VerifyValueChecksum:false EncryptionKey:[] EncryptionKeyRotationDuration:240h0m0s BypassLockGuard:false ChecksumVerificationMode:0 DetectConflicts:true NamespaceOffset:-1 managedTxns:false maxBatchCount:0 maxBatchSize:0 maxValueThreshold:0} WhiteListedIPRanges:[] StrictMutations:false AclEnabled:false HmacSecret:**** AbortOlderThan:5m0s ProposedGroupId:0 StartTime:2021-08-05 19:16:15.798430129 +0000 UTC m=+0.265114410 Ludicrous:enabled=false; concurrency=2000 LudicrousEnabled:false Security:token=; whitelist= EncryptionKey:**** LogRequest:0 HardSync:false Audit:false}
I0805 19:16:16.396923      19 run.go:754] worker.Config: {PostingDir:p WALDir:w MutationsMode:0 AuthToken: HmacSecret:**** AccessJwtTtl:0s RefreshJwtTtl:0s CachePercentage:0,65,35 CacheMb:1024 Audit:<nil> ChangeDataConf:file=; kafka=; sasl_user=; sasl_password=; ca_cert=; client_cert=; client_key=; sasl-mechanism=PLAIN;}
I0805 19:16:16.397085      19 log.go:295] Found file: 1 First Index: 0
I0805 19:16:16.399098      19 storage.go:125] Init Raft Storage with snap: 0, first: 1, last: 0
I0805 19:16:16.399128      19 server_state.go:140] Opening postings BadgerDB with options: {Dir:p ValueDir:p SyncWrites:false NumVersionsToKeep:2147483647 ReadOnly:false Logger:0x33e3080 Compression:1 InMemory:false MetricsEnabled:true NumGoroutines:8 MemTableSize:67108864 BaseTableSize:2097152 BaseLevelSize:10485760 LevelSizeMultiplier:10 TableSizeMultiplier:2 MaxLevels:7 VLogPercentile:0 ValueThreshold:1048576 NumMemtables:5 BlockSize:4096 BloomFalsePositive:0.01 BlockCacheSize:697932185 IndexCacheSize:375809638 NumLevelZeroTables:5 NumLevelZeroTablesStall:15 ValueLogFileSize:1073741823 ValueLogMaxEntries:1000000 NumCompactors:4 CompactL0OnClose:false LmaxCompaction:false ZSTDCompressionLevel:0 VerifyValueChecksum:false EncryptionKey:[] EncryptionKeyRotationDuration:240h0m0s BypassLockGuard:false ChecksumVerificationMode:0 DetectConflicts:false NamespaceOffset:1 managedTxns:false maxBatchCount:0 maxBatchSize:0 maxValueThreshold:0}
I0805 19:16:16.425051      19 log.go:34] All 0 tables opened in 0s
I0805 19:16:16.427041      19 log.go:34] Discard stats nextEmptySlot: 0
I0805 19:16:16.427190      19 log.go:34] Set nextTxnTs to 0
I0805 19:16:16.431935      19 groups.go:99] Current Raft Id: 0x0
I0805 19:16:16.432064      19 worker.go:114] Worker listening at address: [::]:7080
I0805 19:16:16.432003      19 groups.go:115] Sending member request to Zero: addr:"dgraph-alpha-0.dgraph-alpha.hm.svc.dev.k8s-hongbomiao.com:7080" 
I0805 19:16:16.434244      19 run.go:565] Bringing up GraphQL HTTP API at 0.0.0.0:8080/graphql
I0805 19:16:16.434300      19 run.go:566] Bringing up GraphQL HTTP admin API at 0.0.0.0:8080/admin
I0805 19:16:16.434320      19 run.go:593] gRPC server started.  Listening on port 9080
I0805 19:16:16.434330      19 run.go:594] HTTP server started.  Listening on port 8080
E0805 19:16:16.434370      19 groups.go:1181] Error during SubscribeForUpdates for prefix "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x15dgraph.graphql.schema\x00": Unable to find any servers for group: 1. closer err: <nil>
I0805 19:16:16.534017      19 pool.go:162] CONNECTING to dgraph-zero-0.dgraph-zero.hm.svc.cluster.local:5080
W0805 19:16:16.537177      19 pool.go:267] Connection lost with dgraph-zero-0.dgraph-zero.hm.svc.cluster.local:5080. Error: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp: lookup dgraph-zero-0.dgraph-zero.hm.svc.cluster.local: no such host"
I0805 19:16:16.737992      19 pool.go:162] CONNECTING to dgraph-zero-1.dgraph-zero.hm.svc.cluster.local:5080
W0805 19:16:16.740397      19 pool.go:267] Connection lost with dgraph-zero-1.dgraph-zero.hm.svc.cluster.local:5080. Error: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp: lookup dgraph-zero-1.dgraph-zero.hm.svc.cluster.local: no such host"
I0805 19:16:17.141647      19 pool.go:162] CONNECTING to dgraph-zero-2.dgraph-zero.hm.svc.cluster.local:5080
W0805 19:16:17.144077      19 pool.go:267] Connection lost with dgraph-zero-2.dgraph-zero.hm.svc.cluster.local:5080. Error: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp: lookup dgraph-zero-2.dgraph-zero.hm.svc.cluster.local: no such host"
E0805 19:16:17.435876      19 groups.go:1181] Error during SubscribeForUpdates for prefix "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x15dgraph.graphql.schema\x00": Unable to find any servers for group: 1. closer err: <nil>
E0805 19:16:18.436001      19 groups.go:1181] Error during SubscribeForUpdates for prefix "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x15dgraph.graphql.schema\x00": Unable to find any servers for group: 1. closer err: <nil>
E0805 19:16:19.436210      19 groups.go:1181] Error during SubscribeForUpdates for prefix 

Note

transport: Error while dialing dial tcp: lookup dgraph-zero-0.dgraph-zero.hm.svc.cluster.local: no such host
transport: Error while dialing dial tcp: lookup dgraph-zero-1.dgraph-zero.hm.svc.cluster.local: no such host
transport: Error while dialing dial tcp: lookup dgraph-zero-2.dgraph-zero.hm.svc.cluster.local: no such host

inside.

However, they are actually exist

➜ kubectl get pod --context=k3d-dev -A
NAMESPACE     NAME                                      READY   STATUS    RESTARTS   AGE
kube-system   metrics-server-86cbb8457f-8jzsd           1/1     Running   0          11m
kube-system   local-path-provisioner-5ff76fc89d-t29w8   1/1     Running   0          11m
kube-system   coredns-7448499f4d-w9htt                  1/1     Running   0          11m
hm            dgraph-zero-0                             1/1     Running   0          4m26s
hm            dgraph-zero-1                             1/1     Running   0          3m13s
hm            dgraph-zero-2                             1/1     Running   0          2m50s
hm            dgraph-alpha-0                            0/1     Running   2          4m26s
➜ kubectl get svc --context=k3d-dev -A
NAMESPACE     NAME                  TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)                  AGE
default       kubernetes            ClusterIP   10.43.0.1      <none>        443/TCP                  10m
kube-system   kube-dns              ClusterIP   10.43.0.10     <none>        53/UDP,53/TCP,9153/TCP   10m
kube-system   metrics-server        ClusterIP   10.43.29.122   <none>        443/TCP                  10m
hm            dgraph-zero-public    ClusterIP   10.43.9.50     <none>        5080/TCP,6080/TCP        3m30s
hm            dgraph-alpha-public   ClusterIP   10.43.127.25   <none>        8080/TCP,9080/TCP        3m30s
hm            dgraph-zero           ClusterIP   None           <none>        5080/TCP                 3m30s
hm            dgraph-alpha          ClusterIP   None           <none>        7080/TCP                 3m30s

My other services in this cluster can talk to each other without any issue.

And I found if I remove --cluster-domain=dev.k8s-hongbomiao.com or change it to --cluster-domain=cluster.local when I create the cluster by k3d, Dgraph HA can be installed without any issue.

However, I need set the cluster domain to do some cluster related work.

How can I install Dgraph HA when there is a cluster domain? Thanks

UPDATE:

I found this happens to Dgraph single server version (Dgraph Alpha and Dgraph are in same pod) too, when I installed by

kubectl create namespace hm
kubectl apply --namespace=hm --filename=https://raw.githubusercontent.com/dgraph-io/dgraph/master/contrib/config/kubernetes/dgraph-single/dgraph-single.yaml
++ hostname -f
+ dgraph alpha --my=dgraph-0.dgraph.hm.svc.dev.k8s-hongbomiao.com:7080 --zero dgraph-0.dgraph.hm.svc.cluster.local:5080
[Sentry] 2021/08/05 22:38:20 Integration installed: ContextifyFrames
[Sentry] 2021/08/05 22:38:20 Integration installed: Environment
[Sentry] 2021/08/05 22:38:20 Integration installed: Modules
[Sentry] 2021/08/05 22:38:20 Integration installed: IgnoreErrors
[Sentry] 2021/08/05 22:38:21 Integration installed: ContextifyFrames
[Sentry] 2021/08/05 22:38:21 Integration installed: Environment
[Sentry] 2021/08/05 22:38:21 Integration installed: Modules
[Sentry] 2021/08/05 22:38:21 Integration installed: IgnoreErrors
I0805 22:38:21.926193      19 sentry_integration.go:48] This instance of Dgraph will send anonymous reports of panics back to Dgraph Labs via Sentry. No confidential information is sent. These reports help improve Dgraph. To opt-out, restart your instance with the --telemetry "sentry=false;" flag. For more info, see https://dgraph.io/docs/howto/#data-handling.
I0805 22:38:22.128588      19 init.go:110] 

Dgraph version   : v21.03.1
Dgraph codename  : rocket-1
Dgraph SHA-256   : a00b73d583a720aa787171e43b4cb4dbbf75b38e522f66c9943ab2f0263007fe
Commit SHA-1     : ea1cb5f35
Commit timestamp : 2021-06-17 20:38:11 +0530
Branch           : HEAD
Go version       : go1.16.2
jemalloc enabled : true

For Dgraph official documentation, visit https://dgraph.io/docs.
For discussions about Dgraph     , visit https://discuss.dgraph.io.
For fully-managed Dgraph Cloud   , visit https://dgraph.io/cloud.

Licensed variously under the Apache Public License 2.0 and Dgraph Community License.
Copyright 2015-2021 Dgraph Labs, Inc.


I0805 22:38:22.128647      19 run.go:752] x.Config: {PortOffset:0 Limit:mutations=allow; query-edge=1000000; disallow-drop=false; query-timeout=0ms; txn-abort-after=5m; max-pending-queries=10000; normalize-node=10000; mutations-nquad=1000000; max-retries=-1 LimitMutationsNquad:1000000 LimitQueryEdge:1000000 BlockClusterWideDrop:false LimitNormalizeNode:10000 QueryTimeout:0s MaxRetries:-1 GraphQL:introspection=true; debug=false; extensions=true; poll-interval=1s; lambda-url= GraphQLDebug:false}
I0805 22:38:22.128782      19 run.go:753] x.WorkerConfig: {TmpDir:t ExportPath:export Trace:ratio=0.01; jaeger=; datadog= MyAddr:dgraph-0.dgraph.hm.svc.dev.k8s-hongbomiao.com:7080 ZeroAddr:[dgraph-0.dgraph.hm.svc.cluster.local:5080] TLSClientConfig:<nil> TLSServerConfig:<nil> Raft:learner=false; snapshot-after-entries=10000; snapshot-after-duration=30m; pending-proposals=256; idx=; group= Badger:{Dir: ValueDir: SyncWrites:false NumVersionsToKeep:1 ReadOnly:false Logger:0xc0003961f0 Compression:1 InMemory:false MetricsEnabled:true NumGoroutines:8 MemTableSize:67108864 BaseTableSize:2097152 BaseLevelSize:10485760 LevelSizeMultiplier:10 TableSizeMultiplier:2 MaxLevels:7 VLogPercentile:0 ValueThreshold:1048576 NumMemtables:5 BlockSize:4096 BloomFalsePositive:0.01 BlockCacheSize:697932185 IndexCacheSize:375809638 NumLevelZeroTables:5 NumLevelZeroTablesStall:15 ValueLogFileSize:1073741823 ValueLogMaxEntries:1000000 NumCompactors:4 CompactL0OnClose:false LmaxCompaction:false ZSTDCompressionLevel:0 VerifyValueChecksum:false EncryptionKey:[] EncryptionKeyRotationDuration:240h0m0s BypassLockGuard:false ChecksumVerificationMode:0 DetectConflicts:true NamespaceOffset:-1 managedTxns:false maxBatchCount:0 maxBatchSize:0 maxValueThreshold:0} WhiteListedIPRanges:[] StrictMutations:false AclEnabled:false HmacSecret:**** AbortOlderThan:5m0s ProposedGroupId:0 StartTime:2021-08-05 22:38:21.453562539 +0000 UTC m=+0.311585012 Ludicrous:enabled=false; concurrency=2000 LudicrousEnabled:false Security:token=; whitelist= EncryptionKey:**** LogRequest:0 HardSync:false Audit:false}
I0805 22:38:22.129055      19 run.go:754] worker.Config: {PostingDir:p WALDir:w MutationsMode:0 AuthToken: HmacSecret:**** AccessJwtTtl:0s RefreshJwtTtl:0s CachePercentage:0,65,35 CacheMb:1024 Audit:<nil> ChangeDataConf:file=; kafka=; sasl_user=; sasl_password=; ca_cert=; client_cert=; client_key=; sasl-mechanism=PLAIN;}
I0805 22:38:22.130677      19 storage.go:125] Init Raft Storage with snap: 0, first: 1, last: 0
I0805 22:38:22.130783      19 server_state.go:140] Opening postings BadgerDB with options: {Dir:p ValueDir:p SyncWrites:false NumVersionsToKeep:2147483647 ReadOnly:false Logger:0x33e3080 Compression:1 InMemory:false MetricsEnabled:true NumGoroutines:8 MemTableSize:67108864 BaseTableSize:2097152 BaseLevelSize:10485760 LevelSizeMultiplier:10 TableSizeMultiplier:2 MaxLevels:7 VLogPercentile:0 ValueThreshold:1048576 NumMemtables:5 BlockSize:4096 BloomFalsePositive:0.01 BlockCacheSize:697932185 IndexCacheSize:375809638 NumLevelZeroTables:5 NumLevelZeroTablesStall:15 ValueLogFileSize:1073741823 ValueLogMaxEntries:1000000 NumCompactors:4 CompactL0OnClose:false LmaxCompaction:false ZSTDCompressionLevel:0 VerifyValueChecksum:false EncryptionKey:[] EncryptionKeyRotationDuration:240h0m0s BypassLockGuard:false ChecksumVerificationMode:0 DetectConflicts:false NamespaceOffset:1 managedTxns:false maxBatchCount:0 maxBatchSize:0 maxValueThreshold:0}
I0805 22:38:22.145325      19 log.go:34] All 0 tables opened in 0s
I0805 22:38:22.147849      19 log.go:34] Discard stats nextEmptySlot: 0
I0805 22:38:22.147915      19 log.go:34] Set nextTxnTs to 0
I0805 22:38:22.150591      19 groups.go:99] Current Raft Id: 0x0
I0805 22:38:22.150605      19 worker.go:114] Worker listening at address: [::]:7080
I0805 22:38:22.150643      19 groups.go:115] Sending member request to Zero: addr:"dgraph-0.dgraph.hm.svc.dev.k8s-hongbomiao.com:7080" 
I0805 22:38:22.153357      19 run.go:565] Bringing up GraphQL HTTP API at 0.0.0.0:8080/graphql
I0805 22:38:22.153486      19 run.go:566] Bringing up GraphQL HTTP admin API at 0.0.0.0:8080/admin
I0805 22:38:22.153549      19 run.go:593] gRPC server started.  Listening on port 9080
I0805 22:38:22.153584      19 run.go:594] HTTP server started.  Listening on port 8080
E0805 22:38:22.153632      19 groups.go:1181] Error during SubscribeForUpdates for prefix "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x15dgraph.graphql.schema\x00": Unable to find any servers for group: 1. closer err: <nil>
I0805 22:38:22.251622      19 pool.go:162] CONNECTING to dgraph-0.dgraph.hm.svc.cluster.local:5080
W0805 22:38:22.472690      19 pool.go:267] Connection lost with dgraph-0.dgraph.hm.svc.cluster.local:5080. Error: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp: lookup dgraph-0.dgraph.hm.svc.cluster.local: no such host"
E0805 22:38:23.154976      19 groups.go:1181] Error during SubscribeForUpdates for prefix "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x15dgraph.graphql.schema\x00": Unable to find any servers for group: 1. closer err: <nil>
E0805 22:38:24.155661      19 groups.go:1181] Error during SubscribeForUpdates for prefix "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x15dgraph.graphql.schema\x00": Unable to find any servers for group: 1. closer err: <nil>
E0805 22:38:25.156576      19 groups.go:1181] Error during SubscribeForUpdates for prefix 

Solution

  • Found the issue, it is because the Dgraph yaml hard coded .svc.cluster.local.

    Opened the pull request at https://github.com/dgraph-io/dgraph/pull/7976 to resolve the issue.