0

Rewriting the question since the previous one was unclear. I am trying to retrieve elements from an HTML page using BeautifulSoup. Below is my HTML page snippet

<span class="cars">Imported</span><br>
<span class="auto"> Cycle 1 / Step 1</span>
<hr noshade><BR>
<table class="table" width="100%" border="0">
     <colgroup>
        <col width="200">
        <col>
     </colgroup>
<tr><td>Macro:</td><td align="left">abc</td></tr>
<tr><td>Comment: </td><td align="left">Valid</td></tr>
<tr><td>status:</td><td align="left" class="prog_stat_pass">PASS</td></tr>
</table></td></tr>

span class="cars">Exported</span><br>
<span class="manual"> Cycle 1 / Step 26</span>
<hr noshade><BR>
<table class="table" width="100%" border="0">
     <colgroup>
        <col width="200">
        <col>
     </colgroup>
<tr><td>Macro:</td><td align="left">def</td></tr>
<tr><td>Comment: </td><td align="left">Valid</td></tr>
<tr><td>status:</td><td align="left" class="prog_stat_blocked">BLOCKED</td></tr>
</table></td></tr>

span class="cars">Transferred</span><br>
<span class="manual"> Cycle 1 / Step 26</span>
<hr noshade><BR>
<table class="table" width="100%" border="0">
     <colgroup>
        <col width="200">
        <col>
     </colgroup>
<tr><td>Macro:</td><td align="left">efg</td></tr>
<tr><td>Comment: </td><td align="left">Invalid</td></tr>
<tr><td>status:</td><td align="left" class="prog_stat_fail">Failed</td></tr>
</table></td></tr>

I need the pass,fail and block status along with their corresponding Comments ,stored in three different lists. My output should look like this:

PASS ['Valid']
FAIL ['Invalid']
BLOCKED ['Valid']

But I am getting this:

PASS []
FAIL []
BLOCKED [['Valid,Valid,Invalid']]

My Code:

self.table = self.soup_file.findAll(class_="table")


self.Macro = [column.findAll('td')[1].get_text() for column in self.table]
self.Comment = [column.findAll('td')[3].get_text() for column in self.table]
self.status = [column.findAll('td')[5].get_text() for column in self.table]
if self.status == "PASS":
    self.pass_.append(self.Comment)
elif self.status == "FAIL":
    self.fail_.append(self.Comment)
else:
    self.blocked_.append(self.Comment)

Only the third else condition is displayed as the output. Meaning even if there are pass and fail status , all are getting stored in blocked(third else) when trying to display the list. Would be great help if solved.

Thanks in advance.

0

1 Answer 1

1

You can do the following:

  • iterate over the tables
  • get each table data into a dictionary
  • group everything together in a resulting dictionary where keys are status values

Something along these lines:

In [3]: from collections import defaultdict

In [4]: results = defaultdict(list)

In [5]: for table in soup.select('table.table'):
            table_data = {row.td.get_text(strip=True).rstrip(':'): row('td')[-1].get_text(strip=True)  
                          for row in table('tr')}
            results[table_data['status']].append(table_data)

In [6]: dict(results)
Out[6]: 
{'PASS': [{'Macro': 'abc', 'Comment': 'Valid', 'status': 'PASS'}],
 'BLOCKED': [{'Macro': 'def', 'Comment': 'Valid', 'status': 'BLOCKED'}],
 'Failed': [{'Macro': 'efg', 'Comment': 'Invalid', 'status': 'Failed'}]}
Sign up to request clarification or add additional context in comments.

1 Comment

Thank you for the suggestion. But I only want the corresponding Comment for the status in a list ,as I am already able to retrieve the entire list. Unfortunately when tried to retrieve only comment for the respective status it is giving key error.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.