5

I am scraping this webpage for usernames which loads the users after scrolling

Url to page : "http://www.quora.com/Kevin-Rose/followers"

I know the number of users on the page (in this case no. is 43812) How can I scroll the page till all the users are loaded? I have searched for the same on the internet and everywhere I got almost same line of code for doing it which is:

driver.execute_script("window.scrollTo(0, )")

How can I determine the vertical position to ensure that all the users are loaded? Is there any other option to achieve the same thing without actually scrolling?

   from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time
import urllib

driver = webdriver.Firefox()
driver.get('http://www.quora.com/')
time.sleep(10)

wait = WebDriverWait(driver, 10)

form = driver.find_element_by_class_name('regular_login')
time.sleep(10)
#add explicit wait

username = form.find_element_by_name('email')
time.sleep(10)
#add explicit wait

username.send_keys('[email protected]')
time.sleep(30)
#add explicit wait

password = form.find_element_by_name('password')
time.sleep(30)
#add explicit wait

password.send_keys('def')
#add explicit wait

password.send_keys(Keys.RETURN)
time.sleep(30)

#search = driver.find_element_by_name('search_input')
search = wait.until(EC.presence_of_element_located((By.XPATH, "//form[@name='search_form']//input[@name='search_input']")))

search.clear()
search.send_keys('Kevin Rose')
search.send_keys(Keys.RETURN)

link = wait.until(EC.presence_of_element_located((By.LINK_TEXT, "Kevin Rose")))
link.click()
#Wait till the element is loaded (Asynchronusly loaded webpage)

handle = driver.window_handles
driver.switch_to.window(handle[1])
#switch to new window 

element = WebDriverWait(driver, 2).until(EC.presence_of_element_located((By.PARTIAL_LINK_TEXT, "Followers")))
element.click()
2
  • There are certainly options. Please show the complete code you have now (including scrolling part). Thanks. Commented Sep 16, 2014 at 14:05
  • I dont think its of any use but I have added the code. This is just code to log into the site and navigate to particular page. I dont know what to add in y coordinate position? Commented Sep 16, 2014 at 14:11

1 Answer 1

4

Since there is nothing special appearing after the last followers bucket is loaded, I would rely on the fact that you know how many followers does the user have and you know how many are loaded on each scroll down (I've inspected - it is 18 per scroll). Hence, you can calculate how many times do you need to scroll the page down.

Here's the implementation (I've used a different user with only 53 followers to demonstrate the solution):

import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

followers_per_page = 18

driver = webdriver.Chrome()  # webdriver.Firefox() in your case
driver.get("http://www.quora.com/Andrew-Delikat/followers")

# get the followers count
element = WebDriverWait(driver, 2).until(EC.presence_of_element_located((By.XPATH, '//li[contains(@class, "FollowersNavItem")]//span[@class="profile_count"]')))
followers_count = int(element.text.replace(',', ''))
print followers_count

# scroll down the page iteratively with a delay
for _ in xrange(0, followers_count/followers_per_page + 1):
    driver.execute_script("window.scrollTo(0, 10000);")
    time.sleep(2)

Also, you may need to increase this 10000 Y coordinate value based on the loop variable in case there is a big number of followers.

Sign up to request clarification or add additional context in comments.

4 Comments

Thanks a lot !! Right now I am trying the following script which appears to work perfectly driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
^Nope. Code which I mentioned above didnt load all the users.
@Siddhesh thank you for the another interesting challenge. Sorry, I didn't quite get - does it work for you?
Yes it worked. Thanks again for putting so much efforts.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.