AttributeError: 'HTTPResponse' object has no attribute 'split'

Question

I am trying to get some information from google finance but I am getting this error

AttributeError: 'HTTPResponse' object has no attribute 'split'

Here is my python code:

import urllib.request
import urllib
from bs4 import BeautifulSoup

symbolsfile = open("Stocklist.txt")

symbolslist = symbolsfile.read()

thesymbolslist = symbolslist.split("\n")

i=0


while i<len (thesymbolslist):
    theurl = "http://www.google.com/finance/getprices?q=" + thesymbolslist[i] + "&i=10&p=25m&f=c"
    thepage = urllib.request.urlopen (theurl)
    print(thesymbolslist[i] + " price is " + thepage.split()[len(thepage.split())-1])
    i= i+1

What are you trying to do here? thepage.split()[len(thepage.split())-1]) — alecxe
– alecxe, Commented May 22, 2016 at 2:19
i am trying to get the page into a list and then get the last attribute from that list and print it. — Zepol
– Zepol, Commented May 22, 2016 at 2:24
You need to read() from the thepage to get an actual string. — Akshat Mahajan
– Akshat Mahajan, Commented May 22, 2016 at 2:26

Community · Accepted Answer · 2017-05-23 11:51:33Z

12

The Cause of the Problem

This is because urllib.request.urlopen (theurl) returns an object representing the connection, not a string.

The Solution

To read data from this connection and actually get a string, you need to do

thepage = urllib.request.urlopen(theurl).read()

and then the rest of your code should follow naturally.

Addendum to the Solution

Occasionally, the string itself contains an unrecognised character encoding glyph, in which case Python converts it into a bytestring.

The right approach to dealing with that is to find the correct character encoding and decode the bytestring into a regular string using it, as seen in this question:

thepage = urllib.request.urlopen(theurl)
# read the correct character encoding from `Content-Type` request header
charset_encoding = thepage.info().get_content_charset()
# apply encoding
thepage = thepage.read().decode(charset_encoding)

It is sometimes safe to make the assumption that the character encoding is utf-8, in which case

thepage = urllib.request.urlopen(theurl).read().decode('utf-8')

does work more often than not. It's a statistically good guess if nothing else.

edited May 23, 2017 at 11:51

CommunityBot

11 silver badge

answered May 22, 2016 at 2:28

Akshat Mahajan

9,8764 gold badges38 silver badges46 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Zepol Over a year ago

once i did that it gave me this error: TypeError: Can't convert 'bytes' object to str implicitly

Akshat Mahajan Over a year ago

It's because the encoding of the string you're receiving is not something Python understands. Give me a minute to provide a fix.

le_m Over a year ago

Your solution is more robust since it does not depend on the source encoding, so OP: better mark this one the right answer :)

le_m · Accepted Answer · 2016-05-22 02:44:36Z

4

Checking the documentation might save you time in the future. It says that the urlopen() method returns an HTTPResponse object which has a read() method. In Python 3, you need to decode the output from the source encoding, in this case UTF-8. So just write

thepage = urllib.request.urlopen(theurl).read().decode('utf-8')

edited May 22, 2016 at 2:44

answered May 22, 2016 at 2:28

le_m

20.4k10 gold badges70 silver badges78 bronze badges

5 Comments

Zepol Over a year ago

once i did that it gave me this error: TypeError: Can't convert 'bytes' object to str implicitly

le_m Over a year ago

Python 3? Then see stackoverflow.com/questions/16699362/… Try thepage = urllib.request.urlopen(theurl).read().decode('utf-8')

Akshat Mahajan Over a year ago

@le_m That assumes the default encoding is utf-8 - which is often true, but is not necessarily the encoding sent over. The correct way to do it is to check the encoding in the headers and apply that.

le_m Over a year ago

@AkshatMahajan You are right, of course, but since OP is just querying google.com we can safely assume UTF-8.

Akshat Mahajan Over a year ago

@le_m You would be surprised what character encodings Google uses in lieu of UTF-8...

Collectives™ on Stack Overflow

AttributeError: 'HTTPResponse' object has no attribute 'split'

2 Answers 2

The Cause of the Problem

The Solution

Addendum to the Solution

3 Comments

5 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

The Cause of the Problem

The Solution

Addendum to the Solution

3 Comments

5 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related