Parse/extract table data using python

Question

<html> 
<table border="1px"> 
<tr>
<td>yes</td>
<td>no</td>
</tr>
</table>
</html>

Is there any way to get the contents of the table (yes ,no) besides beautifulsoup??

A python beginner,any help or any kind of direction will be of great help.

Thank you

Yes there is. Should you do it without a parser? Probably not. — Jacob
– Jacob, Commented Jul 14, 2011 at 8:22
okay,how do i parse it ??.. any tutorial sites that you might suggest??... googling it dint give fruitful result.. — Php Beginner
– Php Beginner, Commented Jul 14, 2011 at 8:30
If the structure of your markup is relatively stable and you can guarantee it's well-formatted, you can try using regexes. (For example, one for enumerating table rows, the other for getting cells within a row). — Xion
– Xion, Commented Jul 14, 2011 at 8:31
@PHP: the reason people like BeautifulSoup is that it is very flexible in the HTML it accepts, which is useful since a lot of what you find on the internet is broken. Things like lxml and HTMLParser are rather stricter on what mistakes they allow. — Katriel
– Katriel, Commented Jul 14, 2011 at 8:35
@Xion : Will check out regexes. @katrielalex ,have been using beautifulsoup. — Php Beginner
– Php Beginner, Commented Jul 14, 2011 at 8:51

Vasiliy Faronov · Accepted Answer · 2011-07-14 08:30:31Z

12

You can use the HTMLParser module that comes with the Python standard library.

>>> import HTMLParser
>>> data = '''
... <html> 
... <table border="1px"> 
... <tr>
... <td>yes</td>
... <td>no</td>
... </tr>
... </table>
... </html>
... '''
>>> class TableParser(HTMLParser.HTMLParser):
...     def __init__(self):
...         HTMLParser.HTMLParser.__init__(self)
...         self.in_td = False
...     
...     def handle_starttag(self, tag, attrs):
...         if tag == 'td':
...             self.in_td = True
...     
...     def handle_data(self, data):
...         if self.in_td:
...             print data
...     
...     def handle_endtag(self, tag):
...         self.in_td = False
... 
>>> p = TableParser()
>>> p.feed(data)
yes
no

answered Jul 14, 2011 at 8:30

Vasiliy Faronov

12.4k2 gold badges43 silver badges55 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Parse/extract table data using python

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related