Print HTML From Url [duplicate]

Question

So I want to print out the HTML of a website

from urllib.request import urlopen

http = urlopen('http://www.google.de/').read()
print(http)

But in the output all newlines are printed as \n and the string begins with a b' which has something to do with a bite array as my google research told me? sorry I'm new to python xD

So my question is how can i print the html code as a normal string with newlines as it would be shown in a text editor?

S.B · Accepted Answer · 2021-09-20 19:26:22Z

4

Have a look at the urlopen documentation. In the HTML header it is written charset=UTF-8. You therefore need to change your line to:

print(http.decode('utf-8'))

In case you have special characters in the HTML output (due to locale settings), use:

print(http.decode('utf-8', errors='ignore'))

edited Sep 20, 2021 at 19:26

S.B

17k12 gold badges38 silver badges73 bronze badges

answered Jun 3, 2016 at 12:01

Maximilian Peters

31.8k12 gold badges95 silver badges102 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Print HTML From Url [duplicate]

1 Answer 1

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Linked

Related