I'm trying to use urllib2 to fetch webpage from a website. After I managed to log on and retrieve the page, I found out the page has some <script>.....</script> inside. How can I save the rendered the output (the complete content of the webpage, not the script)?
-
Are you saying you'd like to save the content of the page, after any included Javascript has been run?Matt Luongo– Matt Luongo2012-02-04 17:42:51 +00:00Commented Feb 4, 2012 at 17:42
-
Are you doing this for testing, screen-scraping for an application, or what? In general, with JavaScript it's the browser that creates the page content, so you need a real browser to duplicate that...Bill Gribble– Bill Gribble2012-02-04 17:44:32 +00:00Commented Feb 4, 2012 at 17:44
-
@MattLuongo Yes, I'm trying to pull some of my personal message from a website which doesn't offer an API.Terry Shi– Terry Shi2012-02-04 17:47:32 +00:00Commented Feb 4, 2012 at 17:47
Add a comment
|
2 Answers
I'd also like to mention pywebkitgtk (which I've been using a lot lately as an embedded browser), and Selenium.
1 Comment
Terry Shi
Selenium with an actual browser driver is very useful, can mimic most human interactions.