pythonpython-requests

How do I specify URL resolution in python's requests library in a similar fashion to curl's --resolve flag?


I am writing some python client code and, due to some environmental constraints, I want to specify a URL and also control how it is resolved. I can accomplish this with curl by using the --resolve flag. Is there a way to do something similar with Python's requests library?

Ideally this would work in Python 2.7 but I can make a 3.x solution work as well.


Solution

  • After doing a bit of digging, I (unsurprisingly) found that Requests resolves hostnames by asking Python to do it (which is asking your operating system to do it). First I found some sample code to hijack DNS resolution (Tell urllib2 to use custom DNS) and then I figured out a few more details about how Python resolves hostnames in the socket documentation. Then it was just a matter of wiring everything together:

    import socket
    import requests
    
    def is_ipv4(s):
        # Feel free to improve this: https://stackoverflow.com/questions/11827961/checking-for-ip-addresses
        return ':' not in s
    
    dns_cache = {}
    
    def add_custom_dns(domain, port, ip):
        key = (domain, port)
        # Strange parameters explained at:
        # https://docs.python.org/2/library/socket.html#socket.getaddrinfo
        # Values were taken from the output of `socket.getaddrinfo(...)`
        if is_ipv4(ip):
            value = (socket.AddressFamily.AF_INET, 0, 0, '', (ip, port))
        else: # ipv6
            value = (socket.AddressFamily.AF_INET6, 0, 0, '', (ip, port, 0, 0))
        dns_cache[key] = [value]
    
    # Inspired by: https://stackoverflow.com/a/15065711/868533
    prv_getaddrinfo = socket.getaddrinfo
    def new_getaddrinfo(*args):
        # Uncomment to see what calls to `getaddrinfo` look like.
        # print(args)
        try:
            return dns_cache[args[:2]] # hostname and port
        except KeyError:
            return prv_getaddrinfo(*args)
    
    socket.getaddrinfo = new_getaddrinfo
    
    # Redirect example.com to the IP of test.domain.com (completely unrelated).
    add_custom_dns('example.com', 80, '66.96.162.92')
    res = requests.get('http://example.com')
    print(res.text) # Prints out the HTML of test.domain.com.
    

    Some caveats I ran into while writing this: