0

I'm trying to webscrape this site: https://www2.tse.or.jp/tseHpFront/JJK020010Action.do

Using the Selenium package, with Google Chrome as my browser, I'm able to open it up, choose some settings, and then run a search. I'm encountering an error because there are 21 pages of information, and I need to gather all of it, however with my code I am unable to find the link that clicks to the next page. This is what that next button's code is:

<div class="next_e">
   <a href="javascript:setPage(2);submitPage(document.JJK020030Form, document.JJK020030Form.Transition);">
      <img src="/common/images/spacer.gif"  width="77"  height="24"  alt="Next">
   </a>
</div>

Note -- the number in the brackets after 'setPage' corresponds to the next page number. So if I'm on page 1 the code would read setPage(2), etc.

Here is my complete code for the webscrape:

driver.get("https://www2.tse.or.jp/tseHpFront/JJK020030Action.do")
sleep(20)
data = []

button = driver.find_element_by_name("dspSsuPd")
#driver.find_elements_by_class_name
button200 = Select(button)
button200.select_by_value('200')

sleep(10)

checkboxes = ['001', '002', '004', '006', '008', '101', '102', '104', 'ETF', 'ETN', 'RET', 'PSC', '999']
for box in checkboxes:
    driver.find_element_by_xpath(f"//input[@value='{box}']").click()

search_button = "//*[@class='activeButton' and @value='Start of search']"
driver.find_element(By.XPATH, search_button).click()
sleep(20)

soup1 = BeautifulSoup(driver.page_source, 'lxml')
tables1 = soup.find_all('table')
df = pd.read_html(driver.page_source)[-1]
data.append(df)

for i in range(2, 21):
    
## right here is where I'm encountering my issue ##
    next_href = f"//*[@class='next_e' and @href ='javascript:setPage({i});submitPage(document.JJK020030Form, document.JJK020030Form.Transition);']"
    driver.find_element(By.XPATH, next_href).click()
    sleep(10)

    soup = BeautifulSoup(driver.page_source, 'lxml')
    tables = soup.find_all('table')
    df1 = pd.read_html(driver.page_source)[-1]
    data.append(df1)

driver.quit()
df_data = pd.DataFrame(pd.concat(data)).reset_index(drop=True)
print(df_data)
df_data.to_csv('companies_data_borse_frankfurt.csv', index=False)

The other options I have tried to click this href (all of which haven't worked), include:

driver.find_element(By.XPATH, "//div[@class='next_e']/a[contains(., 'setPage')]").click()
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//div[@class='next_e']/a[contains(., 'setPage')]"))).click()
driver.find_element_by_xpath(f'//input[@href="javascript:setPage({i});submitPage(document.JJK020030Form, document.JJK020030Form.Transition);"]').click()
driver.find_element_by_partial_link_text(f'javascript:setPage({i})')

Please let me know if you have a solution or need further clarification on the issue. Thank you!

0

1 Answer 1

1
wait=WebDriverWait(driver,60)      
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR,"div.next_e>a"))).click()

Using this work just fine for going through the pages.

Import:

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait 
from selenium.webdriver.support import expected_conditions as EC
Sign up to request clarification or add additional context in comments.

2 Comments

You could trim some code off since starting on page and then incrementing would just be better here.
Unfortunately I can't. This site doesn't work that way :/

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.