Reading Local HTML File in Python

Question

I've been reviewing examples of how to read in HTML from websites using XPass and lxml. For some reason when I try with a local file I keep running into this error.

AttributeError: 'str' object has no attribute 'content'

This is the code

with open(r'H:\Python\Project\File','r') as f:
    file = f.read()
f.close()

tree = html.fromstring(file.content)

file is string already, change to tree = html.fromstring(file) , with open will close the file f automatically, no need to re-close, remove f.close(). — Tiny.D
– Tiny.D, Commented Nov 26, 2017 at 2:51

James · Accepted Answer · 2017-11-26 03:34:15Z

1

You have a few problems with your code. It looks like you are modifying code that is parsing html from an http/https request. In that case using .content() extracts the bytes from the response object.

However, when reading from a file, you are already reading in the contents of the file in your with context. Also, you don't need to use .close(), the context manager takes care of that for you.

Try this:

with open(r'H:\Python\Project\File','r') as f:
    tree = html.fromstring(f.read())

answered Nov 26, 2017 at 3:34

James

37k4 gold badges54 silver badges79 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

T. C. Over a year ago

thank you, this worked great! And is much more consise

johnashu · Accepted Answer · 2017-11-26 03:54:31Z

0

Try encoding='utf-8'

f1 = open(new_file + '.html', 'r', encoding="utf-8")

answered Nov 26, 2017 at 3:54

johnashu

2,2114 gold badges23 silver badges46 bronze badges

Collectives™ on Stack Overflow

Reading Local HTML File in Python

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related