So I want to print out the HTML
of a website
from urllib.request import urlopen
http = urlopen('http://www.google.de/').read()
print(http)
But in the output all newlines are printed as \n
and the string begins with a b' which has something to do with a bite array as my google research told me? sorry I'm new to python xD
So my question is how can i print the html code as a normal string with newlines as it would be shown in a text editor?
Have a look at the urlopen documentation. In the HTML header it is written charset=UTF-8
. You therefore need to change your line to:
print(http.decode('utf-8'))
In case you have special characters in the HTML output (due to locale settings), use:
print(http.decode('utf-8', errors='ignore'))