I've a smartphone. On this smartphone, I've a mobile hotspot, essentially a portable WiFi network that pipes my phone's internet access to my laptop.
On my laptop, I've Python 3 and the requests library. Here's using Python and requests to get google.com
, with my phone's hotspot. (result is exactly the same using "real wifi".)
>>> x = requests.get("http://google.com")
>>> x.apparent_encoding; x[:100]
'ISO-8859-2'
'<!doctype html><html itemscope="" itemtype="http://schema.org/WebPage" lang="en"><head><meta content'
Good! Everything is going as planned.
Also on my laptop, I've Factor, and it has an easy-to-use wgetter in the standard library. Here's http-get
working on a "normal" WiFi network.
IN: scratchpad "http://google.com" http-get nip
--- Data stack:
"<!doctype html><html itemscope=\"\" itemtype=\"http://schema.org..."
Success!
Well, no. http-get
on my phone's hotspot:
IN: scratchpad "http://google.com" http-get nip
--- Data stack:
"\x1f\b\0\0\0\0\0\0\x03Å<ëzÛ¶ÿÏSÐH+K+\"u\x17eÚ&iâÓ¤Ik§i7Íú\x03IHbÄIʲ#ë]öQw\x06\0..."
Uh.
And it's not just Google. http-get
ting Stack Overflow, or any other website over my phone's network gives rather similar results.
Printing that string:
...
No? Ah, well, OK.
Factor is 100% UTF-8 by default. ISO-8859
should be translatable to UTF-8, and indeed, it is when not using my phone's internet.
I know mobile service providers have a reputation of injecting Bad Things into served content. But if the encoding's the same, and Python treats them the same, and Python says they have the same encoding... what's going on here?
Factor is HEAD
. Python is 3.5. Laptop is Ubuntu 15.10, Android is 5.1.something, and probably most importantly, my mobile service provider is StraightTalk.
As the Python demonstration shows, I don't normally experience issues with page content.
https://github.com/factor/factor/issues/1589
I didn't think to look at the headers.
The answer?
content-encoding: Accept-Encoding
on normal WiFi.
content-encoding: gzip
on hotspot.
Now how to ungzip with Factor is another question.