pythongzipdecodingtransfer-encoding

decode response of get request for persian websites


I'm writing function for send request and get response of websites and parse of content of it... but when i send request to persian sites it cant decode content of it

def gather_links(page_url):
    html_string = ''
    try:
        response = urlopen(page_url)
        if 'text/html' in response.getheader('Content-Type'):
            html_bytes = response.read()
            html_string = html_bytes.decode("utf-8")    
    except Exception as e:
        print(str(e))

show this ERROR for example https://www.entekhab.ir/ :

'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte

how can i change the code for decode this kind of sites too?


Solution

  • You should use requests instead of urllib.

    import requests
    
    response = requests.get('https://www.entekhab.ir/')
    print(response.text)