1

I am trying to crawl a website (with python) and get its users info. But when I download the source of the pages, it is different from what I see in inspect element in chrome. I googled and it seems I should use selenium, but I don't know how to use it. This is the code I have and when I see the driver.page_source it is still the source page as in chrome and doesn't look like the source in inspect element. I really appreciate if someone can help me to fix this.

import os
from selenium import webdriver

chromedriver = "/Users/adam/Downloads/chromedriver"
os.environ["webdriver.chrome.driver"] = chromedriver
driver = webdriver.Chrome(chromedriver)
driver.get("http://www.tudiabetes.org/forum/users/Bug74/activity")
driver.quit()

1 Answer 1

2

It's called XHR.
Your page was loaded from another call, (your url only loads the strcuture of the page, and the meat of the page comes from a different source using XHR, json formatted string) not the pageload it self.

You should really consider using requests and bs4 to query this page instead.

Sign up to request clarification or add additional context in comments.

3 Comments

Thanks for your reply. But still it's not similar to inspect element code. I'm trying to find the date user has joined. But I can't.
@Erin the date is in the JavaScript section of the page created_at":"2009-07-15T23 I suggest you study requests and bs4
Thanks @taesu. This solved the problem of seeing the joined date. I was hoping to get something similar to the html code in inspect element though.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.