0

I was using the below code snippet to extract the "Locations" link using selenium webdrive in python, but not able to extract the link, was only able to extract the text ("Locations"). Can anyone help me in this?

Link to extract from: https://www.thomasnet.com/company/siemens-corporation-10035100/profile?cov=NA&which=comp&what=Siemens+Corporation&cid=10035100&searchpos=1

enter image description here

Code Snippet used:

lnk_content = driver.find_element(By.XPATH,"//*[@id='__next']/div/div[2]/div/div[1]/div/div/button/span")
lnk = lnk_content.get_attribute("href")
print(lnk)
4
  • There is no href attribute in the targeted span element. Obviously nothing will be extracted. Commented Dec 5, 2023 at 11:40
  • But there is a link in the "Locations" button, is there a way to extract that? Commented Dec 5, 2023 at 12:48
  • Where is the link? Do you see it in the HTML DOM? Commented Dec 5, 2023 at 12:50
  • Clicking that links works because there is a click event listener attached to the <button> element that contains that Location link. There is no link to extract contained in the HTML itself. This is probably designed explicitly to prevent web scraping. Using Selenium, you could perhaps click the link and then read the URL of the page that gets loaded. Commented Dec 5, 2023 at 13:16

2 Answers 2

1

Agree with larks comment. See the below code to click on Location element and extract the URL which gets loaded.

Code:

import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Chrome()
driver.get("https://www.thomasnet.com/company/siemens-corporation-10035100/profile?cov=NA&which=comp&what=Siemens+Corporation&cid=10035100&searchpos=1")
driver.maximize_window()
wait = WebDriverWait(driver, 10)

wait.until(EC.element_to_be_clickable((By.XPATH, "(//span[text()='Locations'])[2]"))).click()
time.sleep(5)
location_url = driver.current_url
print(location_url)

Console:

https://www.thomasnet.com/company/siemens-corporation-10035100/branches?pg=1

Process finished with exit code 0
Sign up to request clarification or add additional context in comments.

Comments

1

Clicking on that link makes new content visible in the existing document. You can click on the link with code like this:

lnk = driver.find_element(By.XPATH,'//button[span[text() = "Locations"]]')

# See https://stackoverflow.com/a/56194349/147356
driver.execute_script("arguments[0].click();", lnk) 

Then you can retrieve the location table:

locations = driver.find_element(By.XPATH, "//div[h2[text() = 'Locations']]/following-sibling::div/table")

And iterate over the rows:

for row in locations.find_elements(By.TAG_NAME, 'tr'):
  ...

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.