Hey guys i am trying to get a html data from a site using urllib.openurl.read() but for some sites all i am getting is data link this * 6\xbdW\xb6\xd6\xff\xca\x9d\x9bO|\xc0\x96a\xc7\xc8\xf7\xa7\x10-\x8aM{\xf8\x* and i have no clue what it is and why i am getting like this. I tried googling it some said there is encoding decoding problem i tried that as well but as you can see no luck there so please guide me in this darkness. Here is my code --- >
url = "http://mangafox.me/manga/online_the_comic/c001/1.html" # for this site and some more its not working
page = urllib.urlopen(url).read()
print page
and you guys know whats happening after printing this code.
This page its on gzip format, you got to unzip before take the data:
UnicodeDecodeError: 'ascii' codec can't decode byte 0x8b in position 1: ordinal not in range(128)
0x8b in the begin of the code it means gzip format.
You should take a look in this question: