1

So I want to print out the HTML of a website

from urllib.request import urlopen

http = urlopen('http://www.google.de/').read()
print(http)

But in the output all newlines are printed as \n and the string begins with a b' which has something to do with a bite array as my google research told me? sorry I'm new to python xD

So my question is how can i print the html code as a normal string with newlines as it would be shown in a text editor?

0

1 Answer 1

4

Have a look at the urlopen documentation. In the HTML header it is written charset=UTF-8. You therefore need to change your line to:

print(http.decode('utf-8'))

In case you have special characters in the HTML output (due to locale settings), use:

print(http.decode('utf-8', errors='ignore'))
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.