I'm trying to parse a query string like this:
filename=logo.txt\\x80\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x01x&filename=.hidden.txt
Since it mixes bytes and text, I tried to alter it such that it will produce the desired escaped url output like so:
extended = 'filename=logo.txt\\x80\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x01x&filename=.hidden.txt'
fixbytes = bytes(extended, 'utf-8')
fixbytes = fixbytes.decode("unicode_escape")
algoext = '?' + urllib.parse.quote(fixbytes, safe='?&=')
This outputs
b'filename=logo.txt\\x80\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x01x&filename=.hidden.txt'
filename=logo.txtx&filename=.hidden.txt
?filename=logo.txt%C2%80%00%00%00%00%00%00%00%00%00%00%00%00%00%00%01x&filename=.hidden.txt
Where does the %C2 byte come from? It's not in the source string and it's not in any of the intermediate steps. What could I do other than manually remove it from the final output string?
P.S. I'm relying on a library to generate the string so changing the way it's represented initially is not an option.
Also achieves my goal:
querystring = '?' + extended.replace('\\x', '%')