-3

Is there a way to get all the link and the text in the html file below. I've tried all means and a lot of answers but don't really get it.

<tr>
    <td><a href="pr_background-image.asp">background-image</a></td>
    <td>Specifies one or more background images for an element</td>
    <td>1</td>
</tr>

I want it to return the .asp link as well as the description below it. The new line character is my main problem and it shows up as \\r\\n

UPDATE: I don't want to use any external module. not beautifulsoup. just regex because the thing i'm working on will be shared and there will e no point if users will have to install something else`

3
  • Check out the BeautifulSoup module for parsing HTML/XML files. Commented Jan 12, 2016 at 1:34
  • 3
    As a rule of thumb, it isn't recommended to use regex to match html: stackoverflow.com/questions/1732348/… However, i recommend that you look at the python libraries that does this for you already, like: stackoverflow.com/questions/17126686/… Commented Jan 12, 2016 at 1:35
  • Sure. Use an HTML parser and learn XPath. Commented Jan 12, 2016 at 1:36

2 Answers 2

0

Using a regex to do what you are looking for is kind of hobbling, parsing the html and using xpath or dom querying would be more outwardly readable.

On top of that, even without the newlines writing a general enough regex would be a bit tricky.

see this post for multiline regexp. With that, you'll probably want to use a capture group to grab the link and another for the td cells.

Sign up to request clarification or add additional context in comments.

Comments

-1

The easiest way to work with html in python is BeautifulSoup or a similar module. I recommend you look into it. In case you want to stick with regex, you can allow for tabs/spaces/new lines etc. between the two <td> tags the following way:

<td><a href=\"(.+?)\">background-image<\/a><\/td>(?:\n|\r|\t|\ )*<td>(.+?)<\/td>

1 Comment

I don't want too use beautifulsoup because of the nature of the project @Tobias R

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.