0

I am either receiving an error or nothing is being parsed/written with the following code:

soup = BeautifulSoup(browser.page_source, 'html.parser')
userinfo = soup.find_all("div", attrs={"class": "fieldWrapper"})
rows = userinfo.find_all(attrs="value")

with open('testfile1.csv', 'w') as outfile:
    writer = csv.writer(outfile)
    writer.writerow(rows)

rows = userinfo.find_all(attrs="value")

AttributeError: 'ResultSet' object has no attribute 'find_all'

So I tried a for loop with print just to test it, but that returns nothing while the program runs successfully:

userinfo = soup.find_all("div", attrs={"class": "fieldWrapper"})
for row in userinfo:
    rows = row.find_all(attrs="value")
    print(rows)

This is the html I am trying to parse. I am trying to return the text from the value attributes:

<div class="controlHolder">
                        <div id="usernameWrapper" class="fieldWrapper">
                            <span class="styled">Username:</span>
                            <div class="theField">
                                <input name="ctl00$cleanMainPlaceHolder$tbUsername" type="text" value="username" maxlength="16" id="ctl00_cleanMainPlaceHolder_tbUsername" disabled="disabled" tabindex="1" class="textbox longTextBox">
                                <input type="hidden" name="ctl00$cleanMainPlaceHolder$hdnUserName" id="ctl00_cleanMainPlaceHolder_hdnUserName" value="AAubrey"> 
                            </div>
                        </div>
                        <div id="fullNameWrapper" class="fieldWrapper">
                            <span class="styled">Full Name:</span>
                            <div class="theField">
                                <input name="ctl00$cleanMainPlaceHolder$tbFullName" type="text" value="Full Name" maxlength="50" id="ctl00_cleanMainPlaceHolder_tbFullName" tabindex="2" class="textbox longTextBox">
                                <input type="hidden" name="ctl00$cleanMainPlaceHolder$hdnFullName" id="ctl00_cleanMainPlaceHolder_hdnFullName" value="Anthony Aubrey">
                            </div>
                        </div>
                        <div id="emailWrapper" class="fieldWrapper">
                            <span class="styled">Email:</span>
                            <div class="theField">
                                <input name="ctl00$cleanMainPlaceHolder$tbEmail" type="text" value="[email protected]" maxlength="60" id="ctl00_cleanMainPlaceHolder_tbEmail" tabindex="3" class="textbox longTextBox">
                                <input type="hidden" name="ctl00$cleanMainPlaceHolder$hdnEmail" id="ctl00_cleanMainPlaceHolder_hdnEmail" value="[email protected]">
                                <span id="ctl00_cleanMainPlaceHolder_validateEmail" style="color:Red;display:none;">Invalid E-Mail</span>
                            </div>
                        </div>
                        <div id="commentWrapper" class="fieldWrapper">
                            <span class="styled">Comment:</span>
                            <div class="theField">
                                <textarea name="ctl00$cleanMainPlaceHolder$tbComment" rows="2" cols="20" id="ctl00_cleanMainPlaceHolder_tbComment" tabindex="4" class="textbox longTextBox"></textarea>
                                <input type="hidden" name="ctl00$cleanMainPlaceHolder$hdnComment" id="ctl00_cleanMainPlaceHolder_hdnComment">
                            </div>
                        </div>

1 Answer 1

1

Your first error stems from the fact that find_all returns a ResultSet, which is more or less a list: you would have to iterate through the elements of userinfo and call find_all on those instead.

For your second issue, I'm pretty sure when attrs is passed a string, it searches for elements with that string as its class. The html you provided contains no elements with class value, so it makes sense that nothing would get printed out. You can access an element's value with .get('value')

To print out the value of the text inputs, the following code should work. (The try/except is just so the script doesn't crash if a text input isn't found)

for field_wrapper in soup.find_all("div", attrs={"class": "fieldWrapper"}):
    try:
        print(field_wrapper.find("input", attrs={"type": "text"}).get('value'))
    except:
        continue
Sign up to request clarification or add additional context in comments.

8 Comments

I see what you mean, I tried using the code you provided but again it prints nothing. I am trying to get the text from value="username" value="Full Name" value="[email protected]" as I am trying to pull text from a form.
Gotcha. My edited answer above prints the expected output when I initialize BeautifulSoup with the source HTML you provided. If it still prints out nothing, its possible browser.page_source isn't what you expect it to be, or your parser isn't handling the page correctly.
I tried the new version you wrote and still nothing, I put except: print('no text found') just to see if it will print anything but still nothing, this seems strange, I think you are right, there seems to be something wrong with the page source perhaps. I am using selenium to get to this point in the code with no issues.
Hmmm, okay. Its probably a parser / soup issue then. Is browser.page_source a sensible value? Also, does the solution above work for you when you initialize BeautifulSoup with the sample HTML?
The solution above works great with the sample HTML, thanks a lot for that. I am using selenium to navigate the site to get to the desired page as information is required to access it. Perhaps there is some issue with the page I am pointing to. browser.page_source should point BeautifulSoup to the correct page but it does not seem like it is parsing from there.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.