Using regex to match string

Question

I am trying to match all phone numbers from text.

https://pythex.org/?regex=%5C%2B%3F(%5Cd*)%5Cs%3F%5C(%3F(%5Cd*)%5C)%3F%5Cs%3F(%5Cd*)%5B%5Cs-%5D%3F(%5Cd*)%5Cs%3F(%5Cd*)%5Cs%3F(%5Cd*)%5Cs%3F&test_string=(510)%20588-3915%0A%2B1%20(510)%20879-4700%0A%2B1(888)654-0143%0A%2B1(919)277-2172%0A%2B1(866)707-7709%0A%2B1(919)597-7014%0A%2B44%20(0)%2020%208435%206555%0A%2B44%20(0)%2020%208435%206555%0A%2B33%201%2070%2070%2096%2061%0A%2B41%20(44)%20595%2094%2001%0A%2B32%20(9)%20277%2094%2021%0A%2B34%20(0)%20931%20790%20659%0A045%204750666%0A%2B41%2044%20595%2094%2001%0A%2B31%20(0)%2020%20262%203824%20okay.2%0A%2B31%20478-511014%0A%2B32%209%20277%2094%2021%0A%2B91%20900%20133%205555&ignorecase=0&multiline=0&dotall=0&verbose=0

This regex works well for me. When I check on regex match website. But when I use in actual code, it gives me wrong result

>>> text = 'my phone is +31 478-511014 and +91 900 133 5555'
>>> mobile = re.findall(r'\+?(\d*)\s?\(?(\d*)\)?\s?(\d*)[\s-]?(\d*)\s?(\d*)\s?(\d*)\s?', text)
>>> mobile
[('', '', '', '', '', ''), ('', '', '', '', '', ''), ('', '', '', '', '', ''), ('', '', '', '', '', ''), ('', '', '', '', '', ''), ('', '', '', '', '', ''), ('', '', '', '', '', ''), ('', '', '', '', '', ''), ('', '', '', '', '', ''), ('', '', '', '', '', ''), ('', '', '', '', '', ''), ('', '', '', '', '', ''), ('31', '478', '', '511014', '', ''), ('', '', '', '', '', ''), ('', '', '', '', '', ''), ('', '', '', '', '', ''), ('', '', '', '', '', ''), ('91', '900', '133', '5555', '', ''), ('', '', '', '', '', '')]

Am I doing something wrong?

If I enter your string ('my phone is +31 478-511014 and +91 900 133 5555') on the website it gives me pretty much the same result. — Klaus D.
– Klaus D., Commented Nov 8, 2017 at 13:32
All the pattern parts are optional, it matches empty strings before each non-matching sequence and whitespace chunks. Re-write to match at least some digits. If you need help, please post the pattern requirements. Or filter out blanks (demo). — Wiktor Stribiżew
– Wiktor Stribiżew, Commented Nov 8, 2017 at 13:34
To implement the above suggestion you could change the first \d* to \d+ — Steve
– Steve, Commented Nov 8, 2017 at 13:38
And forgot to add to the above comment: all (...) create items in the resulting list of tuples, you need to either remove the unnecessary groups or turn those necessary ones into non-capturing. Or use re.finditer as in my demo. Although I have not considered + and -, so it is not a final answer. — Wiktor Stribiżew
– Wiktor Stribiżew, Commented Nov 8, 2017 at 13:40
Thanks, It matches on website, but when I try programatically it gives values in broken string like [('31', '478', '', '511014', '', ''), ('91', '900', '133', '5555', '', '')] How can I get exact phone number as result — user2129623
– user2129623, Commented Nov 8, 2017 at 13:41

zipa · Accepted Answer · 2017-11-08 13:46:32Z

1

This one should work, if you had issue with yours:

[(+\d]\d[-\d\s()]*\d

edited Nov 8, 2017 at 13:46

answered Nov 8, 2017 at 13:39

zipa

28k6 gold badges45 silver badges62 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

user2129623 Over a year ago

It gives sre_constants.error: bad character range

zipa Over a year ago

Edited, - should be 1st in square brackets, and regular brackets don't need to be escaped.

user2129623 Over a year ago

Thanks, Can you please help in using my regex which I mentioned in question

Celestial Fury · Accepted Answer · 2017-11-08 13:54:14Z

0

it's working if you replace one of the * to +

mobile = re.findall(r'+?(\d+)\s?(?(\d*))?\s?(\d*)[\s-]?(\d*)\s?(\d*)\s?(\d*)\s?', text)

answered Nov 8, 2017 at 13:54

Celestial Fury

372 bronze badges

Collectives™ on Stack Overflow

Using regex to match string

2 Answers 2

3 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest