I'm running PowerDNS recursor inside my k8s cluster. My python script is on a different pod
that is doing rdns to my powerdns
rescursor
app. I have my hpa Max replica
set to 8
. However, I do not think the load is the problem here. I'm unsure what to do to resolve this timeout error that I'm getting below. I can increase the replicas to solve the problem temporarily, and then it would happen again.
[ipmetadata][MainThread][source.py][144][WARNING]: dns_error code=12, message=Timeout while contacting DNS servers
It seems like my pods are rejecting incoming traffic therefore it's outputting the dns_error code=12.
Here is part of my script that's running the rdns
return_value = {
'rdns': None
}
try:
async for attempt in AsyncRetrying(stop=stop_after_attempt(3)):
with attempt:
try:
if ip:
result = await self._resolver.query(ip_address(ip).reverse_pointer, 'PTR')
return_value['rdns'] = result.name
return return_value
except DNSError as dns_error:
# 1 = DNS server returned answer with no data
# 4 = Domain name not found
# (seems to just be a failure of rdns lookup no sense in retrying)
# 11 = Could not contact DNS servers
if int(dns_error.args[0]) in [1, 4, 11]:
return return_value
LOG.warning('dns_error code=%d, message=%s, ip=%s', dns_error.args[0], dns_error.args[1], ip)
raise
except RetryError as retry_ex:
inner_exception = retry_ex.last_attempt.exception()
if isinstance(inner_exception, DNSError):
# 12 = Timeout while contacting DNS servers
LOG.error('dns_error code=%d, message=%s, ip=%s', inner_exception.args[0], inner_exception.args[1], ip)
else:
LOG.exception('rnds lookup failed')
return return_value
The error code 12 indicates that the PowerDNS recursor did not receive a response from any of the authoritative servers for the queried domain within the configured timeout. This could be due to network issues, firewall rules, rate limiting, or misconfiguration of the recursor or the authoritative servers.
There are a few things you can try to resolve this timeout error:
ping
, traceroute
, or dig
to diagnose network problems.iptables
, nftables
, or ufw
to manage firewall rules.pdnsutil
or pdns_control
to configure rate limiting on PowerDNS recursor and authoritative servers.pdnsutil
or pdns_control
to manage PowerDNS configuration files and settings.Here are some examples of how to use the tools mentioned above to troubleshoot the timeout error:
import subprocess
recursor_pod_ip = "10.0.0.1" # replace with the actual IP address of the recursor pod
ping_result = subprocess.run(["ping", "-c", "4", recursor_pod_ip], capture_output=True)
print(ping_result.stdout.decode())
This will send four ICMP packets to the recursor pod and print the output. You should see something like this:
PING 10.0.0.1 (10.0.0.1) 56(84) bytes of data.
64 bytes from 10.0.0.1: icmp_seq=1 ttl=64 time=0.123 ms
64 bytes from 10.0.0.1: icmp_seq=2 ttl=64 time=0.098 ms
64 bytes from 10.0.0.1: icmp_seq=3 ttl=64 time=0.102 ms
64 bytes from 10.0.0.1: icmp_seq=4 ttl=64 time=0.101 ms
--- 10.0.0.1 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3060ms
rtt min/avg/max/mdev = 0.098/0.106/0.123/0.010 ms
This indicates that the network connectivity and latency between the python pod and the recursor pod are good.
kubectl exec -it recursor-pod -- traceroute 8.8.8.8
This will trace the route taken by packets from the recursor pod to the authoritative server at 8.8.8.8 (Google DNS). You should see something like this:
traceroute to 8.8.8.8 (8.8.8.8), 30 hops max, 60 byte packets
1 10.0.0.1 (10.0.0.1) 0.123 ms 0.098 ms 0.102 ms
2 10.0.1.1 (10.0.1.1) 0.456 ms 0.432 ms 0.419 ms
3 10.0.2.1 (10.0.2.1) 0.789 ms 0.765 ms 0.752 ms
4 192.168.0.1 (192.168.0.1) 1.123 ms 1.098 ms 1.085 ms
5 192.168.1.1 (192.168.1.1) 1.456 ms 1.432 ms 1.419 ms
6 192.168.2.1 (192.168.2.1) 1.789 ms 1.765 ms 1.752 ms
7 192.168.3.1 (192.168.3.1) 2.123 ms 2.098 ms 2.085 ms
8 192.168.4.1 (192.168.4.1) 2.456 ms 2.432 ms 2.419 ms
9 192.168.5.1 (192.168.5.1) 2.789 ms 2.765 ms 2.752 ms
10 8.8.8.8 (8.8.8.8) 3.123 ms 3.098 ms 3.085 ms
This indicates that the route to the authoritative server is clear and there are no firewall blocks or network issues.
kubectl exec -it recursor-pod -- dig example.com
This will send a DNS query for the domain name example.com to the recursor pod and print the response. You should see something like this:
; <<>> DiG 9.11.5-P4-5.1ubuntu2.1-Ubuntu <<>> example.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 12345
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;example.com. IN A
;; ANSWER SECTION:
example.com. 3600 IN A 93.184.216.34
;; Query time: 12 msec
;; SERVER: 10.0.0.1#53(10.0.0.1)
;; WHEN: Tue Jun 15 12:34:56 UTC 2021
;; MSG SIZE rcvd: 56
This indicates that the recursor pod received a valid response from the authoritative server for the domain name example.com.
kubectl exec -it recursor-pod -- pdns_control get-all
This will print all the configuration settings of the recursor pod. You should look for the following settings:
max-cache-entries=1000000
max-packetcache-entries=500000
max-recursion-depth=40
max-tcp-clients=128
max-udp-queries-per-round=1000
max-udp-queries-per-second=10000
These settings control the maximum number of cache entries, TCP clients, UDP queries, and recursion depth that the recursor pod can handle. You can adjust them according to your needs and resources. You can use the following command to set a new value for a setting:
kubectl exec -it recursor-pod -- pdns_control set max-udp-queries-per-second 20000
This will set the maximum number of UDP queries per second to 20000.
dig +short CHAOS TXT version.bind @8.8.8.8
This will send a DNS query for the version of the authoritative server at 8.8.8.8. You should see something like this:
"google-public-dns-a.google.com"
This indicates that the authoritative server is running Google Public DNS, which is a well-known and reliable DNS service. You can check the documentation of Google Public DNS for more information on its configuration and features. You can also use the following command to check the DNSSEC status of the authoritative server:
dig +short CHAOS TXT id.server @8.8.8.8
This will send a DNS query for the identity of the authoritative server at 8.8.8.8. You should see something like this:
"edns0"
This indicates that the authoritative server supports EDNS0, which is an extension of the DNS protocol that enables DNSSEC and other features. You can check the documentation of EDNS0 for more information on its functionality and benefits.