12

I am using the following code:

downloadlink = re.findall("http://uploadir.com/u/(.*)\b", str(downloadhtml))

However, when I pass it the following string:

<input type="text" value="http://uploadir.com/u/bb41c5b3" />

It finds nothing, when I'm expecting it to find: http://uploadir.com/u/bb41c5b3. What am I doing wrong?

I have tested the regex using http://gskinner.com/RegExr/ and it seems to be correct. Am I missing something here?

2 Answers 2

13

Get in the habit of making all regex patterns with raw strings:

In [16]: re.findall("http://uploadir.com/u/(.*)\b", '<input type="text" value="http://uploadir.com/u/bb41c5b3" />')
Out[16]: []

In [17]: re.findall(r"http://uploadir.com/u/(.*)\b", '<input type="text" value="http://uploadir.com/u/bb41c5b3" />')
Out[17]: ['bb41c5b3']

The difference is due to \b being interpreted differently:

In [18]: '\b'
Out[18]: '\x08'

In [19]: r'\b'
Out[19]: '\\b'

'\b' is an ASCII Backspace, while r'\b' is a string composed of the two characters, a backslash and a b.

Sign up to request clarification or add additional context in comments.

Comments

9
>>> import re
>>> html = '<input type="text" value="http://uploadir.com/u/bb41c5b3" />';
>>> regex = r'http://uploadir.com/u/([^"]+)'
>>> link = re.findall(regex, html)
>>> link
['bb41c5b3']
>>> 

1 Comment

You genius! Thank so much! Have to wait 5 minutes, to mark as fixed >.>

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.