6

I want to read an HTML file in Python 3.4.3.

I have tried:

import urllib.request
fname = r"C:\Python34\html.htm"
HtmlFile = open(fname,'w')
print (HtmlFile)

This prints:

<_io.TextIOWrapper name='C:\\Python34\\html.htm' mode='w' encoding='cp1252'>

I want to get the HTML source so that I can parse it with beautiful soup.

1
  • 2
    If you want to read you shouldn't open it for writing ;) open(fname, 'w') => open(fname, 'r'). Commented Sep 13, 2015 at 7:58

2 Answers 2

14

You will have to read the contents of the file.

HtmlFile = open(fname, 'r', encoding='utf-8')
source_code = HtmlFile.read() 
Sign up to request clarification or add additional context in comments.

3 Comments

im getting this error for the above lineFile "C:/Python34/pretty.py", line 4, in <module> source_code = HtmlFile.read() File "C:\Python34\lib\encodings\cp1252.py", line 23, in decode return codecs.charmap_decode(input,self.errors,decoding_table)[0] UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 4411: character maps to <undefined>
Use encoding to read the file - HtmlFile = open(fname, 'r', encoding='utf-8')
Remember to close the file when you're done: HtmlFile.close()
1

I was trying to read the saved HTML file in the folder. I tried code mentioned by Vikasa but was getting an error. So I changed the code and tried to read it again it worked for me. The code is as follows:

    fname = 'page_source.html' #this html file is stored on the same folder of the code file
    html_file = open(fname, 'r')
    source_code = html_file.read() 

print the html page using

source_code 

It will print the content read from the page_source.html file.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.