Let's say I send a request with JavaScript:
fetch("/query?q=" + encodeURIComponent("c'est un château? Yes & no!"));
to a Python Bottle server:
from bottle import Bottle, request
app = Bottle("")
@app.route("/query")
def query():
q = request.query.get("q") # how to decode it?
print(q)
app.run()
How to decode request.query.get("q")
so that it extracts the encodeURIComponent
encoding? In this example, the ?
and &
are correctly decoded, but the â
is not: print(q)
gives c'est un château? Yes & no!
Bottle implements its own query parsing, _parse_qsl
def _parse_qsl(qs):
r = []
for pair in qs.split('&'):
if not pair: continue
nv = pair.split('=', 1)
if len(nv) != 2: nv.append('')
key = urlunquote(nv[0].replace('+', ' '))
value = urlunquote(nv[1].replace('+', ' '))
r.append((key, value))
return r
urlunquote
is either urllib.unquote
in Python 2.x, or urllib.parse.unquote
with encoding
preset to 'latin1'
in Python 3.x:
from urllib.parse import urlencode, quote as urlquote, unquote as urlunquote
urlunquote = functools.partial(urlunquote, encoding='latin1')
It's this assumption that leads to the result you're seeing, whereas the default 'utf8'
would work:
>>> from urllib.parse import unquote
>>> quoted = "c'est%20un%20ch%C3%A2teau%3F%20Yes%20%26%20no!"
>>> unquote(quoted, encoding="latin1")
"c'est un château? Yes & no!"
>>> unquote(quoted)
"c'est un château? Yes & no!"
The unquoted values are then fed into a FormsDict
, where either attribute access or calling the getunicode
method would give you the UTF-8 version, whereas the get
method and index access give Latin-1:
>>> from bottle import FormsDict
>>> fd = FormsDict()
>>> fd["q"] = "c'est un château? Yes & no!"
>>> fd["q"]
"c'est un château? Yes & no!"
>>> fd.get("q")
"c'est un château? Yes & no!"
>>> fd.q
"c'est un château? Yes & no!"
>>> fd.getunicode("q")
"c'est un château? Yes & no!"
Alternatively, you can decode
a version where everything is UTF-8:
>>> utfd = fd.decode("utf8")
>>> utfd["q"]
"c'est un château? Yes & no!"
>>> utfd.get("q")
"c'est un château? Yes & no!"
>>> utfd.q
"c'est un château? Yes & no!"
>>> utfd.getunicode("q")
"c'est un château? Yes & no!"
This is covered in the docs:
Additionally to the normal dict-like item access methods (which return unmodified data as native strings), [
FormsDict
] also supports attribute-like access to its values. Attributes are automatically de- or recoded to matchinput_encoding
(default: ‘utf8’).
and:
To simplify dealing with lots of unreliable user input,
FormsDict
exposes all its values as attributes, but with a twist: These virtual attributes always return properly encoded unicode strings, even if the value is missing or character decoding fails. They never returnNone
or throw an exception, but return an empty string instead:name = request.query.name # may be an empty string
...
>>> request.query['city'] 'Göttingen' # An utf8 string provisionally decoded as ISO-8859-1 by the server >>> request.query.city 'Göttingen' # The same string correctly re-encoded as utf8 by bottle