3

I am trying to get some information from google finance but I am getting this error

AttributeError: 'HTTPResponse' object has no attribute 'split'

Here is my python code:

import urllib.request
import urllib
from bs4 import BeautifulSoup

symbolsfile = open("Stocklist.txt")

symbolslist = symbolsfile.read()

thesymbolslist = symbolslist.split("\n")

i=0


while i<len (thesymbolslist):
    theurl = "http://www.google.com/finance/getprices?q=" + thesymbolslist[i] + "&i=10&p=25m&f=c"
    thepage = urllib.request.urlopen (theurl)
    print(thesymbolslist[i] + " price is " + thepage.split()[len(thepage.split())-1])
    i= i+1
3
  • What are you trying to do here? thepage.split()[len(thepage.split())-1]) Commented May 22, 2016 at 2:19
  • i am trying to get the page into a list and then get the last attribute from that list and print it. Commented May 22, 2016 at 2:24
  • You need to read() from the thepage to get an actual string. Commented May 22, 2016 at 2:26

2 Answers 2

12

The Cause of the Problem

This is because urllib.request.urlopen (theurl) returns an object representing the connection, not a string.


The Solution

To read data from this connection and actually get a string, you need to do

thepage = urllib.request.urlopen(theurl).read()

and then the rest of your code should follow naturally.

Addendum to the Solution

Occasionally, the string itself contains an unrecognised character encoding glyph, in which case Python converts it into a bytestring.

The right approach to dealing with that is to find the correct character encoding and decode the bytestring into a regular string using it, as seen in this question:

thepage = urllib.request.urlopen(theurl)
# read the correct character encoding from `Content-Type` request header
charset_encoding = thepage.info().get_content_charset()
# apply encoding
thepage = thepage.read().decode(charset_encoding)

It is sometimes safe to make the assumption that the character encoding is utf-8, in which case

thepage = urllib.request.urlopen(theurl).read().decode('utf-8')

does work more often than not. It's a statistically good guess if nothing else.

Sign up to request clarification or add additional context in comments.

3 Comments

once i did that it gave me this error: TypeError: Can't convert 'bytes' object to str implicitly
It's because the encoding of the string you're receiving is not something Python understands. Give me a minute to provide a fix.
Your solution is more robust since it does not depend on the source encoding, so OP: better mark this one the right answer :)
4

Checking the documentation might save you time in the future. It says that the urlopen() method returns an HTTPResponse object which has a read() method. In Python 3, you need to decode the output from the source encoding, in this case UTF-8. So just write

thepage = urllib.request.urlopen(theurl).read().decode('utf-8')

5 Comments

once i did that it gave me this error: TypeError: Can't convert 'bytes' object to str implicitly
Python 3? Then see stackoverflow.com/questions/16699362/… Try thepage = urllib.request.urlopen(theurl).read().decode('utf-8')
@le_m That assumes the default encoding is utf-8 - which is often true, but is not necessarily the encoding sent over. The correct way to do it is to check the encoding in the headers and apply that.
@AkshatMahajan You are right, of course, but since OP is just querying google.com we can safely assume UTF-8.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.