javascriptpythonbottleencodeuricomponent

Parameter encoded with encodeURIComponent, how to get it in a Python Bottle server?


Let's say I send a request with JavaScript:

fetch("/query?q=" + encodeURIComponent("c'est un château? Yes & no!"));

to a Python Bottle server:

from bottle import Bottle, request
app = Bottle("")
@app.route("/query")
def query():
    q = request.query.get("q")   # how to decode it?
    print(q)
app.run()    

How to decode request.query.get("q") so that it extracts the encodeURIComponent encoding? In this example, the ? and & are correctly decoded, but the â is not: print(q) gives c'est un château? Yes & no!


Solution

  • Bottle implements its own query parsing, _parse_qsl

    def _parse_qsl(qs):
        r = []
        for pair in qs.split('&'):
            if not pair: continue
            nv = pair.split('=', 1)
            if len(nv) != 2: nv.append('')
            key = urlunquote(nv[0].replace('+', ' '))
            value = urlunquote(nv[1].replace('+', ' '))
            r.append((key, value))
        return r
    

    urlunquote is either urllib.unquote in Python 2.x, or urllib.parse.unquote with encoding preset to 'latin1' in Python 3.x:

        from urllib.parse import urlencode, quote as urlquote, unquote as urlunquote
        urlunquote = functools.partial(urlunquote, encoding='latin1')
    

    It's this assumption that leads to the result you're seeing, whereas the default 'utf8' would work:

    >>> from urllib.parse import unquote
    >>> quoted = "c'est%20un%20ch%C3%A2teau%3F%20Yes%20%26%20no!"
    >>> unquote(quoted, encoding="latin1")
    "c'est un château? Yes & no!"
    >>> unquote(quoted)
    "c'est un château? Yes & no!"
    

    The unquoted values are then fed into a FormsDict, where either attribute access or calling the getunicode method would give you the UTF-8 version, whereas the get method and index access give Latin-1:

    >>> from bottle import FormsDict
    >>> fd = FormsDict()
    >>> fd["q"] = "c'est un château? Yes & no!"
    >>> fd["q"]
    "c'est un château? Yes & no!"
    >>> fd.get("q")
    "c'est un château? Yes & no!"
    >>> fd.q
    "c'est un château? Yes & no!"
    >>> fd.getunicode("q")
    "c'est un château? Yes & no!"
    

    Alternatively, you can decode a version where everything is UTF-8:

    >>> utfd = fd.decode("utf8")
    >>> utfd["q"]
    "c'est un château? Yes & no!"
    >>> utfd.get("q")
    "c'est un château? Yes & no!"
    >>> utfd.q
    "c'est un château? Yes & no!"
    >>> utfd.getunicode("q")
    "c'est un château? Yes & no!"
    

    This is covered in the docs:

    Additionally to the normal dict-like item access methods (which return unmodified data as native strings), [FormsDict] also supports attribute-like access to its values. Attributes are automatically de- or recoded to match input_encoding (default: ‘utf8’).

    and:

    To simplify dealing with lots of unreliable user input, FormsDict exposes all its values as attributes, but with a twist: These virtual attributes always return properly encoded unicode strings, even if the value is missing or character decoding fails. They never return None or throw an exception, but return an empty string instead:

    name = request.query.name    # may be an empty string
    

    ...

    >>> request.query['city']
    'Göttingen' # An utf8 string provisionally decoded as ISO-8859-1 by the server
    >>> request.query.city
    'Göttingen'  # The same string correctly re-encoded as utf8 by bottle