How to get right HTML code from a concrete URL (python)

Question

Im trying to write a code, that will be able to verify domain through whois.domaintools.com.

But theres a little problem with reading the html, that do not match with whois.domaintools.com/notregistereddomain.com source code. Whats wrong? Its problem with requsting or what? I really dont know how to solve it.

import urllib2

def getPage():
    url="http://whois.domaintools.com/notregistereddomain.com"

    req = urllib2.Request(url)

    try:
        response = urllib2.urlopen(req)
        return response.read()
    except urllib2.HTTPError, error:
        print "error: ", error.read()
        a = error.read()
        f = open("URL.txt", "a")
        f.write(a)
        f.close()


if __name__ == "__main__":
    namesPage = getPage()
    print namesPage

Schnouki · Accepted Answer · 2011-06-08 11:23:25Z

2

If you use print error instead of print error.read(), you'll see that you're getting a HTTP 403 "Forbidden" answer from the server.

Apparently this server doesn't like requests without a user-agent header (or it doesn't like Python's one because it doesn't want to be queried from a script). Here's a workaround:

user_agent = "Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)" # Or any valid user agent from a real browser
headers = {"User-Agent": user_agent}
req = urllib2.Request(url, headers=headers)
res = urllib2.urlopen(req)
print res.read()

answered Jun 8, 2011 at 11:23

Schnouki

7,7373 gold badges35 silver badges38 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

How to get right HTML code from a concrete URL (python)

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related