2

I've written a script to parse html and print the text content only. I wanted to ignore the tags. But my program has a problem. I am not sure what it is. Please help me.

enter image description here

import urllib.request
import re
from bs4 import BeautifulSoup
url = "www.example.com"

def hi():
    dep = urllib.request.urlopen(url)
    soup = BeautifulSoup(dep, 'html.parser')
    for link in soup.find_all('p', string=True):
        result = re.sub(b'<.*?>', "", link)
        print (result)
hi() 

The website link.

5
  • add the code here. Commented Mar 11, 2016 at 10:29
  • and make sure to include the full traceback as text and what you have tried to solve the issue. Commented Mar 11, 2016 at 10:30
  • @Vasanth post the code not the url you tried to scrape.. Commented Mar 11, 2016 at 10:32
  • I have added my code here. thanks in advance. Commented Mar 12, 2016 at 13:55
  • Convert your screenshot to valid code and valid traceback. You posted really messy thing before editing. Commented Mar 12, 2016 at 14:54

1 Answer 1

8

I believe, that you have NavigableString in link variable.

Force cast it to string like:

for link in soup.find_all('p', string=True):
    result = re.sub(b'<.*?>', "", str(link))
    print (result)
Sign up to request clarification or add additional context in comments.

5 Comments

But now, It shows me a TypeError: cannot use a bytes pattern on a string-like object.
change b'<.*?>' to r'<.*?>'.
Hurrayyyy... Thank you so much. Can you explain that line? because I copied that line from another code. I don't know what is the logic behind :)
@VasanthPrabakar, r is for regexp, b is for byte. Their behaviour also varies from version to version of python.
r is for raw, not regexp. It helps with regular expression because they use `\` differently from how strings normally treat it, but it was not created for regular expressions; it just means raw.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.