TypeError: expected string or bytes-like object

Question

I've written a script to parse html and print the text content only. I wanted to ignore the tags. But my program has a problem. I am not sure what it is. Please help me.

import urllib.request
import re
from bs4 import BeautifulSoup
url = "www.example.com"

def hi():
    dep = urllib.request.urlopen(url)
    soup = BeautifulSoup(dep, 'html.parser')
    for link in soup.find_all('p', string=True):
        result = re.sub(b'<.*?>', "", link)
        print (result)
hi()

The website link.

and make sure to include the full traceback as text and what you have tried to solve the issue. — timgeb
– timgeb, Commented Mar 11, 2016 at 10:30
Convert your screenshot to valid code and valid traceback. You posted really messy thing before editing. — Nikolay Fominyh
– Nikolay Fominyh, Commented Mar 12, 2016 at 14:54

ZygD · Accepted Answer · 2017-04-10 22:20:08Z

8

I believe, that you have NavigableString in link variable.

Force cast it to string like:

for link in soup.find_all('p', string=True):
    result = re.sub(b'<.*?>', "", str(link))
    print (result)

edited Apr 10, 2017 at 22:20

ZygD

24.8k41 gold badges106 silver badges144 bronze badges

answered Mar 12, 2016 at 14:51

Nikolay Fominyh

9,2969 gold badges72 silver badges104 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Vasanth Prabakar Over a year ago

But now, It shows me a TypeError: cannot use a bytes pattern on a string-like object.

Nikolay Fominyh Over a year ago

change b'<.*?>' to r'<.*?>'.

Vasanth Prabakar Over a year ago

Hurrayyyy... Thank you so much. Can you explain that line? because I copied that line from another code. I don't know what is the logic behind :)

Nikolay Fominyh Over a year ago

@VasanthPrabakar, r is for regexp, b is for byte. Their behaviour also varies from version to version of python.

zondo Over a year ago

r is for raw, not regexp. It helps with regular expression because they use `\` differently from how strings normally treat it, but it was not created for regular expressions; it just means raw.

Collectives™ on Stack Overflow

TypeError: expected string or bytes-like object

1 Answer 1

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related