-1

I got a question how to extract some text using python regex. I would like to do what I want using regex only not using the module for HTML such like a bs4.

it's example text as follow .

tr_range =

<tr>
    <td class="table-basic-l">
        Resolution
    </td>
    <td class="table-basic-l">
        Horizontal Frequency (kHz)
    </td>
    <td class="table-basic-l">
        Vertical Frequency (Hz)
    </td>
</tr>

I'd like to extract all texts under td elements like as Resolution, Horizontal Frequency (kHz), Vertical Frequency (Hz) using regex only.

I am trying to exclude start of all td elements but it's not that so easy for me so far.

6
  • You just want the text or an arry with the <td> texts in it? Commented Nov 6, 2018 at 6:57
  • 1
    I do not condone this summoning of Cthulhu. Commented Nov 6, 2018 at 6:57
  • just want text not any attribute Commented Nov 6, 2018 at 7:00
  • 1
    You should really use HTMLParser. Commented Nov 6, 2018 at 7:02
  • yeah i am able to solve this using HTML parser, but i want to know if it is possible to use regex Commented Nov 6, 2018 at 7:06

1 Answer 1

2

You can get the text with removing the html tags with regex like this (works only for tables (tr and td tags)):

import re

html='<tr>'\
    '<td class="table-basic-l">'\
    '    Resolution'\
    '</td>'\
    '<td class="table-basic-l">'\
    '    Horizontal Frequency (kHz)'\
    '</td>'\
    '<td class="table-basic-l">'\
    '    Vertical Frequency (Hz)'\
    '</td>'\
'</tr>'

print(re.sub("<[/]*t.*?>", "", html))
Sign up to request clarification or add additional context in comments.

7 Comments

you almost approched, but what if there is < or > in the text ?
Then use Beautiful Soup.
@Cho shouldn't be a problem see here: regexr.com/42i67
@Cho if you have only <tr> and <td> elements, then my edit might help you?
yes of course it is! actually i was shocked seeing your code.
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.