1

I've been reviewing examples of how to read in HTML from websites using XPass and lxml. For some reason when I try with a local file I keep running into this error.

AttributeError: 'str' object has no attribute 'content'

This is the code

with open(r'H:\Python\Project\File','r') as f:
    file = f.read()
f.close()

tree = html.fromstring(file.content)
1
  • file is string already, change to tree = html.fromstring(file) , with open will close the file f automatically, no need to re-close, remove f.close(). Commented Nov 26, 2017 at 2:51

2 Answers 2

1

You have a few problems with your code. It looks like you are modifying code that is parsing html from an http/https request. In that case using .content() extracts the bytes from the response object.

However, when reading from a file, you are already reading in the contents of the file in your with context. Also, you don't need to use .close(), the context manager takes care of that for you.

Try this:

with open(r'H:\Python\Project\File','r') as f:
    tree = html.fromstring(f.read())
Sign up to request clarification or add additional context in comments.

1 Comment

thank you, this worked great! And is much more consise
0

Try encoding='utf-8'

f1 = open(new_file + '.html', 'r', encoding="utf-8")

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.