1

I use Selenium in Python for scraping. I can't get values though these values are displayed on the browser.

So I checked the HTML source code, then I found that there are no values in HTML as below.

HTML

<div id="pos-list-body" class="list-body">

</div>

But there are values when I checked developer tool in chrome.

DevTools

<div id="pos-list-body" class="list-body">
    <div class="list-body-row" id="pos-row-1">
        <div class="pos-list-col-1">
            <input class="list-checkbox" type="checkbox" value="1">
        </div>
        <div class="detail-data pos-list-col-2">
            1
        </div>
        <div class="detail-data pos-list-col-3">
            a
        </div>
        ...
    </div>
    <div class="list-body-row" id="pos-row-2">
        <div class="pos-list-col-1">
            <input class="list-checkbox" type="checkbox" value="2">
        </div>
        <div class="detail-data pos-list-col-2">
            2
        </div>
        <div class="detail-data pos-list-col-3">
            b
        </div>
        ...
    </div>
    ...
</div>

It seems that these values generated by JavaScript or something.

There is no iframe in sorce code.

How can I get these values with python?

It would be appreciated if you could give me some hint.

6
  • Do you mean that the elements show up after the page loads? Try putting a time.sleep to wait before trying to extract elements. Commented Apr 27, 2022 at 2:12
  • No, the elements in HTML are always blank after loading page. But the elements in DevTools exist. So result is the same if putting a time.sleep. Thank you for your comment. Commented Apr 27, 2022 at 2:43
  • What website is this from? Commented Apr 27, 2022 at 3:58
  • I'm afraid that I can't show you the website because the page I ask needs login. Thank you for your comment. Commented Apr 27, 2022 at 4:54
  • @SamuraiBlue which webdriver are you using with python selenium? If you're using a non-headless browser, have you tried inspecting the page in the same browser using the same credentials? Selenium should be able to get the injected HTML on the page, but there maybe something about your script implentation that's preventing the HTML from being injected into the #pos-list-body element. If you're not using a headless browser, it could be helpful to watch the script execution as you're running it, and see if you can gain additional insight from that. Commented May 3, 2022 at 18:42

3 Answers 3

1

If ID pos-list-body is unique in HTML-DOM, then your best bet is to use explicit wait with innerText

Code:

wait = WebDriverWait(driver, 20)
print(wait.until(EC.presence_of_element_located((By.ID, "pos-list-body"))).get_attribute('innerText'))

Imports:

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
Sign up to request clarification or add additional context in comments.

5 Comments

The elements in HTML are always blank after loading page. But the elements in DevTools exist. So result is the same if putting WebDriverWait or time.sleep. Thank you for your answer.
you can extract it using get_attribute('innerText') that's the real answer in this answer
Nothing is printed when I tried your answer. When I tried get_attribute('outerHTML') instead of` get_attribute('innerText'), only div tag was printed as below. <div id="pos-list-body" class="list-body"> and </div>.
Is it possible to share the link to the page?
I'm afraid that I can't share you the link to the page because the page needs login. Thank you for your comment.
0

Element.outerHTML

The outerHTML attribute of the Element gets the serialized HTML fragment describing the element including its descendants. It can also be set to replace the element with nodes parsed from the given string. However to only obtain the HTML representation of the contents of an element ideally you need to use the innerHTML property instead. So reading the value of outerHTML returns a DOMString containing an HTML serialization of the element and its descendants. Setting the value of outerHTML replaces the element and all of its descendants with a new DOM tree constructed by parsing the specified htmlString.


Solution

To get the html generated by JavaScript you can use the following solution:

print(driver.execute_script("return document.getElementById('pos-list-body').outerHTML"))

4 Comments

Only div tag was gotten as below, <div id="pos-list-body" class="list-body"> and </div> though I tried your solution code. Thank you for your answer.
Presumably you are looking for 1, a, 2 and b at thr wrong place. That's not the desired node.
Thank you for your comment. My question code is certainly editing to simplify my question because a lot of div tag and so on. Can you tell me what is wrong?
First of all, this isn't a comment really but a well researched answer. However without further details about the real time scenario it would be unwise to speculate anything further.
0

based on other answers that seems to be not working as a solution to your issue, one possibility left which is there are more then one HTML element in the DOM that has the ID : pos-list-body, and I guess the first retrieved element by this ID is really empty and it is not your targeted element. Solution : try to select the <div> using Xpath instead of id, OR get all the elements with this id in a list and print the innerHTML of each one of them to get your targeted element index.

1 Comment

Although I tried your advice, it didn't work. print(len(driver.find_elements(by=By.ID, value='pos-list-body'))) is 1. The result XPATH is []. The result of html = driver.execute_script("return document.getElementsByTagName('html')[0].innerHTML") print(html) doesn't exist any element I expect. Thank you for your answer. It would be appreciated if you could give me some hint.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.