2

I have the following code and would like to make it compatible with both python 2.7 and python 3.6

from re import sub, findall

return sub(r'  ', ' ', sub(r'(\s){2,}', ' ',sub(r'[^a-z|\s|,]|_| 
(x)\1{1,}', '', x.lower())))

I received the following error: TypeError: cannot use a string pattern on a bytes-like object

I understood that the python3 distinguishes byte and string(unicode),but not sure how to proceed.

Thanks.

tried the following and not working

return sub(rb'  ', b' ', sub(rb'(\s){2,}', b' ',sub(rb'[^a-z|\s|,]|_|(x)\1{1,}', b'', x.lower())))

2 Answers 2

1

Have you tried using re.findall? For instance:

import re

respdata =      # the data you are reading

content = re.findall(r'#findall from and too#', str(respdata))    # output in string
for contents in content:
    print(contents)    # print results
Sign up to request clarification or add additional context in comments.

Comments

0

The "string" you have must be a series of bytes, which you can convert to a real string using x.decode('utf-8'). You can see the problem with a simple example:

>>> import re
>>> s = bytes('hello', 'utf-8')
>>> s
b'hello'
>>> re.search(r'[he]', s)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/Cellar/python/3.7.4/Frameworks/Python.framework/Versions/3.7/lib/python3.7/re.py", line 183, in search
    return _compile(pattern, flags).search(string)
TypeError: cannot use a string pattern on a bytes-like object
>>> s.decode('utf-8')
'hello'
>>> re.search(r'[he]', s.decode('utf-8'))
<re.Match object; span=(0, 1), match='h'>

I'm assuming your bytes represent UTF-8 data, but if you're working with a different encoding then just pass its name to decode() instead.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.