I am writing some python client code and, due to some environmental constraints, I want to specify a URL and also control how it is resolved. I can accomplish this with curl by using the --resolve flag. Is there a way to do something similar with Python's requests library?
Ideally this would work in Python 2.7 but I can make a 3.x solution work as well.
After doing a bit of digging, I (unsurprisingly) found that Requests resolves hostnames by asking Python to do it (which is asking your operating system to do it). First I found some sample code to hijack DNS resolution (Tell urllib2 to use custom DNS) and then I figured out a few more details about how Python resolves hostnames in the socket documentation. Then it was just a matter of wiring everything together:
import socket
import requests
def is_ipv4(s):
# Feel free to improve this: https://stackoverflow.com/questions/11827961/checking-for-ip-addresses
return ':' not in s
dns_cache = {}
def add_custom_dns(domain, port, ip):
key = (domain, port)
# Strange parameters explained at:
# https://docs.python.org/2/library/socket.html#socket.getaddrinfo
# Values were taken from the output of `socket.getaddrinfo(...)`
if is_ipv4(ip):
value = (socket.AddressFamily.AF_INET, 0, 0, '', (ip, port))
else: # ipv6
value = (socket.AddressFamily.AF_INET6, 0, 0, '', (ip, port, 0, 0))
dns_cache[key] = [value]
# Inspired by: https://stackoverflow.com/a/15065711/868533
prv_getaddrinfo = socket.getaddrinfo
def new_getaddrinfo(*args):
# Uncomment to see what calls to `getaddrinfo` look like.
# print(args)
try:
return dns_cache[args[:2]] # hostname and port
except KeyError:
return prv_getaddrinfo(*args)
socket.getaddrinfo = new_getaddrinfo
# Redirect example.com to the IP of test.domain.com (completely unrelated).
add_custom_dns('example.com', 80, '66.96.162.92')
res = requests.get('http://example.com')
print(res.text) # Prints out the HTML of test.domain.com.
Some caveats I ran into while writing this:
https
. The code works fine (just use https://
and 443
instead of http://
and 80
). However, SSL certificates are tied to domain names and Requests is going to try validating the name on the certificate to the original domain you tried connecting to.getaddrinfo
returns slightly different info for IPv4 and IPv6 addresses. My implementation for is_ipv4
feels hacky to me and I strongly recommend a better version if you're using this in a real application.