pythonparsinghttpx

httpx parsing error when plus sign appears


I'm sending a get request with the following data in order to check the banking status of a payment QR Code in Brazil

url = 'https://<bankAPI>/qrcode?qrcode='
data = '00020101021126360014br.gov.bcb.pix0114+55489962650955204000053039865802BR5925TEREZINHA APARECIDA DEL S6008BRASILIA62070503***63040833'

if I put this data into a get request like this response = httpx.get(url+data) and I print the response.url I get

https://<bankAPI>/qrcode?qrcode='00020101021126360014br.gov.bcb.pix0114+55489962650955204000053039865802BR5925TEREZINHA%20APARECIDA%20DEL%20S6008BRASILIA62070503***63040833

as you can see the plus sign + is not being encoded and I'm forced to use the parsing library before passing the data to httpx I thought the encoding was handled directly by httpx, is there anything I'm doing wrong?

I tried requests library and it's the same...so now I'm using the Quote function to parse the data but I think HTTX should handle it directly.


Solution

  • Short answer

    Use the params= argument for httpx.get() for URL encoding more closely aligned with what you seem to expect it to do:

    data = { 'qrcode': '00020101021126360014br.gov.bcb.pix0114+55489962650955204000053039865802BR5925TEREZINHA APARECIDA DEL S6008BRASILIA62070503***63040833' }
    req = httpx.get('https://<bankAPI>/qrcode', params=data)
    

    Longer explanation

    URL parsing and encoding in HTTPX is handled in httpx/_urlparse.py.

    The characters that get percent-encoded vary with which part of the URL is being processed. For the query string (the part that you're interested in), this is specifically handled in _urlparse.py:265-267. The safe parameter passed to quote() is a string of characters which should not be percent encoded for the given URL component.

    In this instance, safe is ultimately set to a string containing the following characters:

    !$&'()*+,;=:/?[]@
    

    Conspicuously this includes the + token, meaning HTTPX will not encode this by default if you build the URL yourself.

    On the flipside, passing your query parameters properly using the params parameter on the httpx.get() method does indeed handle this for you via calls to urllib.parse.parse_qs:

    import httpx
    req = httpx.get('https://reqres.in/', params={'a': 'b+c+d'})
    print(req.url) # Result: 'https://reqres.in/?a=b%2Bc%2Bd'
    

    Repl.it demo

    In short - if you're going to pass URL parameters and necessarily need them to be handled as you seem to expect, use the params= argument.