Python regex not working

Question

I am using the following code:

downloadlink = re.findall("http://uploadir.com/u/(.*)\b", str(downloadhtml))

However, when I pass it the following string:

<input type="text" value="http://uploadir.com/u/bb41c5b3" />

It finds nothing, when I'm expecting it to find: http://uploadir.com/u/bb41c5b3. What am I doing wrong?

I have tested the regex using http://gskinner.com/RegExr/ and it seems to be correct. Am I missing something here?

unutbu · Accepted Answer · 2011-01-15 17:18:50Z

13

Get in the habit of making all regex patterns with raw strings:

In [16]: re.findall("http://uploadir.com/u/(.*)\b", '<input type="text" value="http://uploadir.com/u/bb41c5b3" />')
Out[16]: []

In [17]: re.findall(r"http://uploadir.com/u/(.*)\b", '<input type="text" value="http://uploadir.com/u/bb41c5b3" />')
Out[17]: ['bb41c5b3']

The difference is due to \b being interpreted differently:

In [18]: '\b'
Out[18]: '\x08'

In [19]: r'\b'
Out[19]: '\\b'

'\b' is an ASCII Backspace, while r'\b' is a string composed of the two characters, a backslash and a b.

edited Jan 15, 2011 at 17:18

answered Jan 15, 2011 at 17:11

unutbu

886k197 gold badges1.9k silver badges1.7k bronze badges

Sign up to request clarification or add additional context in comments.

Comments

joksnet · Accepted Answer · 2011-01-15 17:11:50Z

9

>>> import re
>>> html = '<input type="text" value="http://uploadir.com/u/bb41c5b3" />';
>>> regex = r'http://uploadir.com/u/([^"]+)'
>>> link = re.findall(regex, html)
>>> link
['bb41c5b3']
>>>

answered Jan 15, 2011 at 17:11

joksnet

2,31515 silver badges18 bronze badges

1 Comment

matthewgall Over a year ago

You genius! Thank so much! Have to wait 5 minutes, to mark as fixed >.>

Collectives™ on Stack Overflow

Python regex not working

2 Answers 2

Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related