2

I cant find a specific comment in python, in example the <!-- why -->. My main reason is to find all the links inside 2 specific comments. Something like a parser. I tried this with Beautifulsoup :

import urllib
over=urlopen("www.gamespot.com").read()
soup = BeautifulSoup(over)
print soup.find("<!--why-->")

But it doesn't work. I think I might have to use regex and not Beautifulsoup.

Please help.

EXAMPLE: we have HTML Code like this

<!--why-->
www.godaddy.com
<p> nice one</p>
www.wwf.com
<!-- why not-->

EDIT: Between the 2 comments, other stuff, like tags, might exist.

And I need to store all the links .

1
  • Give a real example, that will help everyone. Commented Oct 8, 2012 at 0:24

1 Answer 1

6

If you want all the comments, you can use findAll with a callable:

>>> from bs4 import BeautifulSoup, Comment
>>> 
>>> s = """
... <p>header</p>
... <!-- why -->
... www.test1.com
... www.test2.org
... <!-- why not -->
... <p>tail</p>
... """
>>> 
>>> soup = BeautifulSoup(s)
>>> comments = soup.findAll(text = lambda text: isinstance(text, Comment))
>>> 
>>> comments
[u' why ', u' why not ']

And once you've got them, you can use the usual tricks to move around:

>>> comments[0].next
u'\nwww.test1.com\nwww.test2.org\n'
>>> comments[0].next.split()
[u'www.test1.com', u'www.test2.org']

Depending on what the page actually looks like, you may have to tweak it a bit, and you'll have to choose which comments you want, but that should work to get you started.

Edit:

If you really want only the ones which look like some specific text, you can do something like

>>> comments = soup.findAll(text = lambda text: isinstance(text, Comment) and text.strip() == 'why')
>>> comments
[u' why ']

or you could filter them after the fact using a list comprehension:

>>> [c for c in comments if c.strip().startswith("why")]
[u' why ', u' why not ']
Sign up to request clarification or add additional context in comments.

6 Comments

Nice solution! I didn't realize I have to import Comment and couldn't get it to work.
The source code might have a lot of comment-blocks. I need to search only for the ones that start with 'why'. Does this work in this way?
@georgemano: I've edited it. It might be worthwhile reading through a Python tutorial -- there are lots of ways to do neat things which are easy once you know them but are hard to guess.
@georgemano: you've now asked three different questions with three slightly different answers. Respectfully, that's not the best way to get help.
It is not my goal to make you suffer or to make you trouble. My goal is to learn things that I cant understand.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.