I've written a script to parse html and print the text content only. I wanted to ignore the tags. But my program has a problem. I am not sure what it is. Please help me.
import urllib.request
import re
from bs4 import BeautifulSoup
url = "www.example.com"
def hi():
dep = urllib.request.urlopen(url)
soup = BeautifulSoup(dep, 'html.parser')
for link in soup.find_all('p', string=True):
result = re.sub(b'<.*?>', "", link)
print (result)
hi()
The website link.
