28

I want to use Mechanize to simulate browsing to a web page with active JavaScript, including DOM Events and AJAX, and so far I've found no way to do that.

I looked at some Python client browsers that support JavaScript like Spynner and Zope, and none of them really work for me. Spynner crashes PyQt all the time, and Zope doesn't support JavaScript as it seems.

Is there a way to simulate browsing with Python only (no extra processes) like WATIR or libraries that manipulate Firefox or Internet Explorer while supporting Javascript fully as if actually browsing the page?

4
  • 1
    The Zope test browser (built on mechanize) never claimed to support JavaScript; where did you read that it might? Commented Apr 26, 2011 at 18:21
  • 1
    Could you explain the problem you're trying to solve? It could be that you may not need JavaScript enabled after all. Commented Apr 26, 2011 at 18:33
  • Tell us what you're trying to do and we'll tell you if we can help you! Commented Apr 26, 2011 at 19:49
  • I'm trying to simulate browsing using strictly python. I can't use anything else because I need to use some specific tweaks and hooks that I can (currently) only do in python. I'm willing to even put in effort and try and bridge Mechanize and PyV8, but I have no idea where to start... Has anyone ever done anything like that before? Commented Apr 28, 2011 at 6:51

5 Answers 5

24

I've played with this new alternative to Mechanize (which I love) called Phantom JS.

It is a full web kit browser like Safari or Chrome but is headless and scriptable. You script it with javascript, not python (as far as I know at least).

There are some example scripts to get you started. It's a lot like using Firebug. I've only spent a few min using it but I found I was quite productive right from the start.

Sign up to request clarification or add additional context in comments.

5 Comments

Nice tool! Why on earth do people downvote without explanation?
It's because 1) it's a Javascript tool when the question explicitly asks for a Python tool and 2) manipulating that tool via the JS API from Python would be a hacky PITA at best.
+1 I think phnatomjs is the way to go, and JavaScript is the language of web
Does PhantomJS actually run the javascript that's on the pages it loads? (As distinct from the javascript in the phantomjs script.) I think it does, but it's hard to tell for sure.
Yes, PhantomJS runs the page just like a regular web-browser does, though without a UI.
16

From http://wwwsearch.sourceforge.net/mechanize/faq.html#general

If you come across this in a page you want to automate, you have four options. Here they are, roughly in order of simplicity.

Figure out what the JavaScript is doing and emulate it in your Python code: for example, by manually adding cookies to your CookieJar instance, calling methods on HTMLForms, calling urlopen, etc. See above re forms.

Use Java’s HtmlUnit or HttpUnit from Jython, since they know some JavaScript.

Instead of using mechanize, automate a browser instead. For example use MS Internet Explorer via its COM automation interfaces, using the Python for Windows extensions, aka pywin32, aka win32all (e.g. simple function, pamie; pywin32 chapter from the O’Reilly book) or ctypes (example). This kind of thing may also come in useful on Windows for cases where the automation API is lacking. For Firefox, there is PyXPCOM.

Get ambitious and automatically delegate the work to an appropriate interpreter (Mozilla’s JavaScript interpreter, for instance). This is what HtmlUnit and httpunit do. I did a spike along these lines some years ago, but I think it would (still) be quite a lot of work to do well.

3 Comments

Options #1 and #3 are purely python.
#1 is indeed python, and that might be what'll go with, but I'm also interested on how I can generalize the process in the future. #3 is really using COM and IE...
@Jeff why is there a problem with using a headless browser or a browser automater?
6

Basically if you want something that deals with javascript then you need a real javascript engine, these invariably involve automating a real browser (I'm including headless ones in this).

Java’s HtmlUnit doesn't do a very good job as it doesn't use a javascript engine from an actual browser. Phantom JS sounds ideal (as newz2000 points out) however I find that when manipulating pages with javascript it can be very difficult to debug your script if you can't actually see the page you're dealing with.

This leads to solutions such as Selenium Webdriver which has a full python API to automate various browsers, however you must run a java jar and it actually launches the browser, so not the pure python solution you're after (but I think this is as close as you can get).

2 Comments

I've used Selenium to automate Firefox via the Python API. It's a little buggy, but it generally works, and is probably the best solution I've seen.
I too resorted to Selenium to automate web browsing for a project where running Javascript was required. For local development I used chromedirver and for production I used Selenium Server. The Selenium Python binding docs are fairly helpful.
4

You can use Selenium with Python. You can then scrape JavaScript-generated content as well as manipulate the page with additional JavaScript (as well as Python).

# In your virtualenv: pip install selenium
from selenium import webdriver

# Launch Firefox GUI
browser = webdriver.Firefox()

# Alternatively, you can drive PhantomJS without a GUI
# With Node.js installed: `npm install -g phantomjs`
# browser = webdriver.PhantomJS()

# Fetch a webpage
browser.get('http://example.com')

# If you need the whole HTML document
# just like inspecting the rendered page with the console
html = browser.page_source

# Get an element, even if it was created with JS
button = browser.find_element_by_css_selector('div.some-class > \
                                               input.the-submit-button')

# Click on something
button.click()

# Execute some JavaScript (assumes jQuery is loaded on the page)
browser.execute_script("$('html, body').animate({ scrollTop: 500 }, 50);")

You can run the code in a Python REPL and use autocomplete to discover the methods available on browser or whatever element you have selected. Or do something like print(dir(browser)) to see what is available.

Comments

3

An example how to use PyV8, to run JS on a DOM with python can be found here:

https://github.com/buffer/thug

This should be fairly easy to make it run together with mechanize.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.