I need to do some screen scraping on a web page where the content I need is generated by AJAX. On the initial page there is a table with 4 tabs. When you click on any of the tabs the content of the table changes. I need the content from the 3rd tab only. I have used the google chrome 'Inspect Element' tool to see what the requests and post data was and I can get the information I need when I put the information (session id and a lot of other cookie data as well as post data) from the inspect element result into a PHP curl request. But this only works for the 30 minutes that the session lasts. Does anyone know of a way I can get to this information?
2 Answers
I wont reproduce the code here but I will point you to the answer. Its within this book:
http://www.amazon.com/Webbots-Spiders-Screen-Scrapers-Developing/dp/1593273975/ref=dp_ob_image_bk
A must buy for someone doing what your doing.
1 Comment
m_grif
Thanks Aaron, i'll check it out.
In the end I used htmlunit to get the content I needed. I also found the HTMLUnit Scripter very useful to help generate the Java code required.
2 Comments
Khom Nazid
The question is about PHP. Your answer is about Java.
m_grif
@KhomNazid yes I asked the question :) I answered it saying what I eventually used to get what I needed.