pythonurllib2

How do you get default headers in a urllib2 Request?


I have a Python web client that uses urllib2. It is easy enough to add HTTP headers to my outgoing requests. I just create a dictionary of the headers I want to add, and pass it to the Request initializer.

However, other "standard" HTTP headers get added to the request as well as the custom ones I explicitly add. When I sniff the request using Wireshark, I see headers besides the ones I add myself. My question is how do a I get access to these headers? I want to log every request (including the full set of HTTP headers), and can't figure out how.

any pointers?

in a nutshell: How do I get all the outgoing headers from an HTTP request created by urllib2?


Solution

  • If you want to see the literal HTTP request that is sent out, and therefore see every last header exactly as it is represented on the wire, then you can tell urllib2 to use your own version of an HTTPHandler that prints out (or saves, or whatever) the outgoing HTTP request.

    # For Python 2, switch these imports to:
    #
    # import httplib as client
    # import urllib2 as request
    
    from http import client
    from urllib import request
    
    class MyHTTPConnection(client.HTTPConnection):
        def send(self, s):
            print(s.decode('utf-8'))  # or save them, or whatever
            client.HTTPConnection.send(self, s)
    
    class MyHTTPHandler(request.HTTPHandler):
        def http_open(self, req):
            return self.do_open(MyHTTPConnection, req)
    
    opener = request.build_opener(MyHTTPHandler)
    response = opener.open('http://www.google.com/')
    

    The result of running this code is:

    GET / HTTP/1.1
    Accept-Encoding: identity
    Host: www.google.com
    User-Agent: Python-urllib/3.9
    Connection: close