0

I'm trying to scrape LinkedIn using selenium. Here's a page for example: https://www.linkedin.com/vsearch/p?firstName=mark

I can see in the html that the search results are in the:

<div id='results-col'> ... </div>

but when I try to access this tag using Beautifulsoup:

browser = webdriver.PhantomJS(executable_path=PATH)
browser.get(url)
bs_obj = BeautifulSoup(browser.page_source, "html.parser")
results_col =  bs_obj.find("div", {"id": "results-col"})

I get nothing(results_col=None). What am I doing wrong?

1
  • Add a sleep after the browser.get for the js to load Commented Dec 14, 2016 at 19:32

1 Answer 1

2

Wait for the desired element to be present and only then get the page source:

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

# ...
browser.get(url)

wait = WebDriverWait(browser, 10)
wait.until(EC.presence_of_element_located((By.ID, "results-col")))

bs_obj = BeautifulSoup(browser.page_source, "html.parser")
Sign up to request clarification or add additional context in comments.

3 Comments

I tried your code but I get: Traceback (most recent call last): File X, line 142, in <module> print(get_link_to_profile(search_url)) File X, line 121, in get_link_to_profile wait.until(EC.presence_of_element_located((By.ID, "results-col"))) File "C:\Users\sergeyy\AppData\Roaming\Python\Python35\site-packages\selenium\webdriver\support\wait.py", line 80, in until raise TimeoutException(message, screen, stacktrace) selenium.common.exceptions.TimeoutException: Message: Screenshot: available via screen
@BobSacamano that could mean different things, but you don't have this element on the page opened with PhantomJS. Take a screenshot with take_screenshot() method after loading the page and see what is actually opened. You might need to start PhantomJS with some arguments to make it work: stackoverflow.com/questions/29463603/….
@BobSacamano or, you may need to tweak the user agent to pretend to be a different browser: coderwall.com/p/9jgaeq/set-phantomjs-user-agent-string.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.