10

I tried to find the answer to my problem in the questions history but they just come back in more than one thousand and after scanning through a few tens of matching answers I gave up. So here is my problem.

I want to be able to find the first sequence of exactly six digits in a string. Given the string “Some text 987654321 and some more text 123456 and some other text again 654321 and more text in the end” I want to find the regex that will match the 123456 sequence.

I am new to regex and a short explanation about how it works will help a lot.

Thank you in advance

3
  • 2
    Will the six digit number always be the same? Will it always be separated by spaces? You may not need to use Regex at all if that's the case. I'm just curious because you didn't specify the nature of the six digit number. Commented Mar 9, 2012 at 2:08
  • I am interested in finding a sequence of exactly 6 digits, regardless of which ones they are. The sequence may be surrounded by any characters , including or not spaces. When I say any I mean it is possible to have any UTF-8 characters. Actually my searched string is in Traditional Chinese and I have no idea what that may be. It is important that if the six digits are part of a sequence with more than six digits that won’t produce any match. Commented Mar 9, 2012 at 2:16
  • That is not what I want. 987654 is part of a more than 6 digits sequence (987654321) and I want to exclude that. Hope it clarifies Thanks Commented Mar 9, 2012 at 2:19

5 Answers 5

22

You can use the pattern (?<!\d)\d{6}(?!\d), which means "a string-position that is not preceded by a digit; followed by exactly six digits; followed by a string-position that is not followed by a digit". (The notation (?<!...), known as a negative lookbehind assertion, means "not preceded by ...". The notation (?!...), known as a negative lookahead assertion, means "not followed by ...". The notation \d means a digit. The notation {n} means "n times", so that e.g. \d{6} means "six digits".)

That could look like this:

final String number;
{
    final Matcher m = Pattern.compile("(?<!\\d)\\d{6}(?!\\d)").matcher(input);
    if(m.find())
        number = m.group(); // retrieve the matched substring
    else
        number = null; // no match found
}

Note: a previous version of this answer suggested the use of word boundaries, \b; but one of your comments suggests that the digits might be immediately preceded or followed by Traditional Chinese characters, which are considered word characters (and therefore wouldn't trigger a word boundary), so I've changed that.

Sign up to request clarification or add additional context in comments.

3 Comments

\w, \b, ... are ASCII based in java (so your \b should have accidentally worked), you can correct this behaviour since Java 7 by using the flag UNICODE_CHARACTER_CLASS, see here
@stema: In Java, although \w is ASCII-based by default, \b is Unicode-based. (Dunno why.)
I was looking for a solution that doesn't involve \b! You're my life saver!
6

The pattern you’re looking for is:

(?x)              # enable comments
(?<! \p{Nd} )     # no decimal number before
\p{Nd} {6}        # exactly six repetitions of a decimal number
(?!= \p{Nd} )     # no decimal number after

That will also pick up things like

U+FF10 ‭ 0 FULLWIDTH DIGIT ZERO
U+FF11 ‭ 1 FULLWIDTH DIGIT ONE
U+FF12 ‭ 2 FULLWIDTH DIGIT TWO
U+FF13 ‭ 3 FULLWIDTH DIGIT THREE
U+FF14 ‭ 4 FULLWIDTH DIGIT FOUR
U+FF15 ‭ 5 FULLWIDTH DIGIT FIVE
U+FF16 ‭ 6 FULLWIDTH DIGIT SIX
U+FF17 ‭ 7 FULLWIDTH DIGIT SEVEN
U+FF18 ‭ 8 FULLWIDTH DIGIT EIGHT
U+FF19 ‭ 9 FULLWIDTH DIGIT NINE

In case you have those in Chinese text.

1 Comment

Very nice: +1 for globalization support and not being bound to whitespace.
1

The first occurrence of 6 digits in the string you posted is actually 987654. If you mean the first occurrence of 6 digits surrounded by characters that are not digits, then this should work:

(?<!\d)(\d{6})(?!\d)

EDIT: This approach uses a negative lookbehind and a negative lookahead. It's slightly different than the word boundary approach in that it will match 123456 in the following strings

123456asdf some text hello

another string a123456 aaaaaaaa

If the numbers will always be surrounded by spaces then the word boundary approach is probably better.

1 Comment

In my example I made it clear what I would like to match. Maybe the question was not quite clear. But your regex worked. Thank you very much.
1
 public static String splitting(String str, int num){
    String arr[] = str.split("[^0-9]");
    for(String s:arr)
        if(s.length() == num)
            return s;
    return null;
}

test with

 public static void main(String[] args) {
    String s =  "Some text 987654321 and some more text 123456 and some other text again 654321 and more text in the end";
    System.out.println(splitting(s, 6));
}

output is

  123456

Comments

0

in Javascript console works this. Watch out for \\d:

replacedString = "rx14ax145N".replace(RegExp("x14(?!\\d)", "g"), "___");

r___ax145N

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.