1

short: How to execute/simulate javascript redirection with python Mechanize?

location.href="http://www.site2.com/";

I've made a python script with mechanize module that looks for a link in a page and follows it.

The problem is on a particular site that when I do

br.follow_link("http://www.address1.com") 

he redirects me to this simple page:

<script language="JavaScript">{                                                                                         
    location.href="http://www.site2.com/";                                                                                           
    self.focus();                                                                                                                   
    }</script>

Now, if I do:

br = mechanize.Browser(factory=mechanize.RobustFactory())

... #other code

br.follow_link("http://www.address1.com") 
for link in br.links():   
br.follow_link(link)
print link

it doesn't prints anything, that means that there is no link in that page. But if I manually parse the page and I execute:

br.open("http://www.site2.com")

Site2 doesn't recognizes that I'm coming from "www.address1.com" and the script does not work as I would like!

Sorry if it's just a newbie question and thank you in advance!

p.s. I have br.set_handle_referer(True)

EDIT: more info: Inspecting that link with Fiddler2 it looks like:

GET http://www.site2.com/ HTTP/1.1 Host: www.site2.com Connection: keep-alive User-Agent: Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.4 (KHTML, like Gecko) Chrome/22.0.1229.94 Safari/537.4 Accept: text/html,application/xhtml+xml,application/xml;q=0.9,/;q=0.8 Referer: http://www.address1.com Accept-Encoding: gzip,deflate,sdch Accept-Language: it-IT,it;q=0.8,en-US;q=0.6,en;q=0.4
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.3 Cookie: PHPSESSID=6e161axxxxxxxxxxx; user=myusername;
pass=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx; ip=79.xx.xx.xx;
agent=a220243a8b8f83de64c6204a5ef7b6eb; __utma=154746788.943755841.1348303404.1350232016.1350241320.43; __utmb=154746788.12.10.1350241320; __utmc=154999999; __utmz=154746788.134999998.99.6.utmcsr=google|utmccn=(organic)|utmcmd=organic|utmctr=%something%something%

so it seems to be a cookie problem?

4
  • Quick check, you aren't mechanizing this to abuse some referral bonus? Commented Oct 14, 2012 at 10:49
  • Do the URLs change or are they always the same? Commented Oct 14, 2012 at 10:50
  • No, I'm not abusing any referral. Moreover: the address in "location.href"doesn't change, but the other one yes. Short: www.site2.com is static, www.address1.com is dynamic. Thank you for answers! Commented Oct 14, 2012 at 18:54
  • Cool that means you won't have to scrape site2's URL out the Javascript. Commented Oct 14, 2012 at 19:01

4 Answers 4

1

Mechanize can't deal with JavaScript, since it can't interpret it, try parsing your site manually and passing this link to, br.follow_link.

Sign up to request clarification or add additional context in comments.

1 Comment

follow_link() wants an Absolute_Url as parameter, so it seems that I can't simply do something like br.follow_link("google.com") thank you for the answer!
1

I solved it! in this way:

    cj = cookielib.LWPCookieJar()
    br.set_cookiejar(cj)

    ...

    br.follow_link("www.address1.com")
    refe= br.geturl()
    req = urllib2.Request(url='www.site2.com')
    req.add_header('Referer', refe)
    opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj) )
    f = opener.open(req) 
    htm = f.read()
    print "\n\n", htm

1 Comment

So it was a cookie problem?
0

How about

br.open("http://alpha.com") 

br.follow_link("http://beta.com")

If you use br_follow_link hopefully that sets the HTTP referrer with the previous page. Whereas if you dobr.open that's like opening a new window, it doesn't set the HTTP referrer header.


Edit. Ok it looks like .follow_link doesn't take strings but takes a special mechanize.Link object with a property .absolute_url. You can fake that.

>>> class Fake:
...     pass
...
>>> x = Fake()
>>> x.absolute_url="http://stackoverflow.com"
>>> br.follow_link(x)
<response_seek_wrapper at 0x2937af8 whose wrapped object = <closeable_response at 0x2937f08 whose fp = <socket._fileobject object at 0x02934970>>>
>>> br.title()
'Stack Overflow'

or make a real mechanize.Link which is less hacky but more tedious.

3 Comments

follow_link() wants an Absolute_Url as parameter, so it seems that I can't simply do something like br.follow_link("google.com"). But anyway really thank you for the answer!
Shucks, it ought to work like that. Have another idea (see edited answer)
It doesn't works as needed. In fact, it works simply like br.open, so site2.com doesn't see that I come from address1.com
0

You could set the HTTP referrer header explicitly before making your request

br.addheaders = [('Referer', 'http://alpha.com')]
br.open("http://beta.com")

More details in the surprisingly difficult to find official docs http://wwwsearch.sourceforge.net/mechanize/doc.html

1 Comment

This seems to be the way to go. But it doesn't work. I've added more info at the end of my answer.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.