python-requestsurllib3

python-requests - prevent URL encoding


I find myself attempting to interact with a legacy device with a wonky back-end server that expects many query strings to include curly braces. ex: http://example.com/thing/page?{item1}{item2}{item3}

Similar to many issues and posts I've found, the default behavior for requests and urllib3 to encode the URL is causing issues.

Suggestions to override the _encode_invalid_chars function doesn't seem to work.

The suggestions to use python-requests prepared requests also don't seem to work.

https://docs.python-requests.org/en/latest/user/advanced/#prepared-requests

I attempted to utilize prepared requests as suggested in past github issues and posts on SO.

Thank you!

Update: Edited my code block to a working example

Update2: Edited my code block to remove the import alias

Update3: Update my code block with both working methods

    import requests
    import urllib3.util.url

    def hook_invalid_chars(component, allowed_chars):
        # handle url encode here, or do nothing
        return component


    # Method1:
    # Warning: this is a hack to override
    # urllib3.util.url._encode_invalid_chars = hook_invalid_chars

    # Method2 (requires urllib3 around 2.0.7):
    # Warning: this is a hack to make urllib3 allow otherwise invalid characters
    urllib3.util.url._QUERY_CHARS.add('{')
    urllib3.util.url._QUERY_CHARS.add('}')

    url_base = 'http://127.0.0.1:5470/thing/page'
    url_query = '{item1}{item2}{item3}'

    with requests.Session() as sess:    
        # Note: need to use both the hack above and a prepared request
        req = requests.Request(method='GET', url=url_base)
        prep = req.prepare()
        prep.url += f'?{url_query}'
        
        resp = sess.send(prep)

Solution

  • I traced it with pdb. The code responsible for reencoding the URL is in urllib3. It happens inside the HTTPConnectionPool.urlopen() method.

    https://github.com/urllib3/urllib3/blob/11f5f5e19bfaab9cdfa2ce9223e62b8f05dd2a99/src/urllib3/connectionpool.py#L723-L727

            # Ensure that the URL we're connecting to is properly encoded
            if url.startswith("/"):
                url = to_str(_encode_target(url))
            else:
                url = to_str(parsed_url.url)
    

    As far as I can tell, it runs unconditionally. There is no way anything you do in requests can override it.

    The urllib3.HTTPConnectionPool.urlopen docs say: "This is the lowest level call for making a request, so you’ll need to specify all the raw details." So there is no even lower level that could override it.

    Probably relevant issues:

    If you don't care about maintainability and just want it to work, you could reach into internal urllib3 implementation details and force it to do your bidding.

    # HUGE HACK!
    import urllib3.util.url
    urllib3.util.url._QUERY_CHARS.add('{')
    urllib3.util.url._QUERY_CHARS.add('}')
    

    Warning! This is obviously a huge hack! Future upgrades of urllib3 might randomly break it. And it is a global change -- it affects the encoding of all requests made by the current Python process.

    Another thing to try might be to use urllib and http.client from the Python standard library. They are generally much harder to use, but perhaps by coincidence they might not reencode the URL.

    Edited to add: You mentioned overriding the _encode_invalid_chars function. In my testing, both overriding _QUERY_CHARS and overriding _encode_invalid_chars worked (and both have the same disadvantages). Are you sure you're testing it correctly? I used base_url = "http://localhost:5470/foo" and ran nc -l 5470 (netcat) in another terminal, like shown in your link. Your print() might not show the real URL that was requested, there could be another encoding step.