0

I am trying to match all phone numbers from text.

https://pythex.org/?regex=%5C%2B%3F(%5Cd*)%5Cs%3F%5C(%3F(%5Cd*)%5C)%3F%5Cs%3F(%5Cd*)%5B%5Cs-%5D%3F(%5Cd*)%5Cs%3F(%5Cd*)%5Cs%3F(%5Cd*)%5Cs%3F&test_string=(510)%20588-3915%0A%2B1%20(510)%20879-4700%0A%2B1(888)654-0143%0A%2B1(919)277-2172%0A%2B1(866)707-7709%0A%2B1(919)597-7014%0A%2B44%20(0)%2020%208435%206555%0A%2B44%20(0)%2020%208435%206555%0A%2B33%201%2070%2070%2096%2061%0A%2B41%20(44)%20595%2094%2001%0A%2B32%20(9)%20277%2094%2021%0A%2B34%20(0)%20931%20790%20659%0A045%204750666%0A%2B41%2044%20595%2094%2001%0A%2B31%20(0)%2020%20262%203824%20okay.2%0A%2B31%20478-511014%0A%2B32%209%20277%2094%2021%0A%2B91%20900%20133%205555&ignorecase=0&multiline=0&dotall=0&verbose=0

This regex works well for me. When I check on regex match website. But when I use in actual code, it gives me wrong result

>>> text = 'my phone is +31 478-511014 and +91 900 133 5555'
>>> mobile = re.findall(r'\+?(\d*)\s?\(?(\d*)\)?\s?(\d*)[\s-]?(\d*)\s?(\d*)\s?(\d*)\s?', text)
>>> mobile
[('', '', '', '', '', ''), ('', '', '', '', '', ''), ('', '', '', '', '', ''), ('', '', '', '', '', ''), ('', '', '', '', '', ''), ('', '', '', '', '', ''), ('', '', '', '', '', ''), ('', '', '', '', '', ''), ('', '', '', '', '', ''), ('', '', '', '', '', ''), ('', '', '', '', '', ''), ('', '', '', '', '', ''), ('31', '478', '', '511014', '', ''), ('', '', '', '', '', ''), ('', '', '', '', '', ''), ('', '', '', '', '', ''), ('', '', '', '', '', ''), ('91', '900', '133', '5555', '', ''), ('', '', '', '', '', '')]

Am I doing something wrong?

6
  • If I enter your string ('my phone is +31 478-511014 and +91 900 133 5555') on the website it gives me pretty much the same result. Commented Nov 8, 2017 at 13:32
  • All the pattern parts are optional, it matches empty strings before each non-matching sequence and whitespace chunks. Re-write to match at least some digits. If you need help, please post the pattern requirements. Or filter out blanks (demo). Commented Nov 8, 2017 at 13:34
  • To implement the above suggestion you could change the first \d* to \d+ Commented Nov 8, 2017 at 13:38
  • And forgot to add to the above comment: all (...) create items in the resulting list of tuples, you need to either remove the unnecessary groups or turn those necessary ones into non-capturing. Or use re.finditer as in my demo. Although I have not considered + and -, so it is not a final answer. Commented Nov 8, 2017 at 13:40
  • Thanks, It matches on website, but when I try programatically it gives values in broken string like [('31', '478', '', '511014', '', ''), ('91', '900', '133', '5555', '', '')] How can I get exact phone number as result Commented Nov 8, 2017 at 13:41

2 Answers 2

1

This one should work, if you had issue with yours:

[(+\d]\d[-\d\s()]*\d
Sign up to request clarification or add additional context in comments.

3 Comments

It gives sre_constants.error: bad character range
Edited, - should be 1st in square brackets, and regular brackets don't need to be escaped.
Thanks, Can you please help in using my regex which I mentioned in question
0

it's working if you replace one of the * to +

mobile = re.findall(r'+?(\d+)\s?(?(\d*))?\s?(\d*)[\s-]?(\d*)\s?(\d*)\s?(\d*)\s?', text)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.