0

I am trying to download the html of a page that is requested through javascript and normally, by clicking a link in the browser. I can download the first page because it has a general URL:

http://www.locationary.com/stats/hotzone.jsp?hz=1

But there are links along the bottom of the page that are numbers (1 to 10). So if you click on one, it goes to, for example, page 2:

http://www.locationary.com/stats/hotzone.jsp?ACTION_TOKEN=hotzone_jsp$JspView$NumericAction&inPageNumber=2

When I put that URL into my program and try to download the html, it gives me the html of a different page on the website and I think it is the home page.

How can I get the html of this URL that uses javascript and when there is no specific URL?

Thanks.

P.S. I am using urllib/urllib2 and cookielib.

Also, I just found something called PyQuery? Could I use that? And how would I do it?

Code:

import urllib
import urllib2
import cookielib
import re

URL = ''

def load(url):

    data = urllib.urlencode({"inUserName":"email", "inUserPass":"password"})
    jar = cookielib.FileCookieJar("cookies")
    opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(jar))
    opener.addheaders.append(('User-agent', 'Mozilla/5.0 (Windows NT 6.1; rv:13.0) Gecko/20100101 Firefox/13.0.1'))
    opener.addheaders.append(('Referer', 'http://www.locationary.com/'))
    opener.addheaders.append(('Cookie','site_version=REGULAR'))
    request = urllib2.Request("https://www.locationary.com/index.jsp?ACTION_TOKEN=tile_loginBar_jsp$JspView$LoginAction", data)
    response = opener.open(request)
    page = opener.open("https://www.locationary.com/index.jsp?ACTION_TOKEN=tile_loginBar_jsp$JspView$LoginAction").read()

    h = response.info().headers
    jsid = re.findall(r'Set-Cookie: (.*);', str(h[5]))
    data = urllib.urlencode({"inUserName":"email", "inUserPass":"password"})
    jar = cookielib.FileCookieJar("cookies")
    opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(jar))
    opener.addheaders.append(('User-agent', 'Mozilla/5.0 (Windows NT 6.1; rv:13.0) Gecko/20100101 Firefox/13.0.1'))
    opener.addheaders.append(('Referer', 'http://www.locationary.com/'))
    opener.addheaders.append(('Cookie','site_version=REGULAR; ' + str(jsid[0])))
    request = urllib2.Request("https://www.locationary.com/index.jsp?ACTION_TOKEN=tile_loginBar_jsp$JspView$LoginAction", data)
    response = opener.open(request)
    page = opener.open(url).read()
    print page

load(URL)
1
  • Could you show some code you are using? Commented Aug 15, 2012 at 1:41

2 Answers 2

1

I haven't used it myself, but I have heard good things about Requests.

Sign up to request clarification or add additional context in comments.

Comments

0

See the project Splinter, maybe It's useful: http://splinter.cobrateam.info/

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.