I have a JSON file with a "description" key in it, that has lots of HTML tags inside. I would like to erase them. They're encoded, like:
<ul> instead of <ul>
I've tried doing text.replace('<.*?>',''), but it doesn't work.
I've also tried with BeautifulSoup doing:
text = soup.get_text()
But it doesn't work neither (it just only decodes the html tags) And finally, I've tried doing:
soup = BeautifulSoup(text)
text = soup.get_text()
text = text.replace('<.*?>','')
Combining that two codes, but the tags won't get deleted...
What I have now in "text" variable (after using beautiful soup that decodes the html tags):
"description":"</li></ul><p> </p><p><strong>TESTING AND QUALITY</strong></p><ul><li>....."
What I want to have in text variable:
"description":"TESTING AND QUALITY"
text.replace()doesn't recognize regular expressions. It's looking for the literal text<.*?>, which of course isn't there.