Java REGEX to match an exact number of digits in a string

Question

I tried to find the answer to my problem in the questions history but they just come back in more than one thousand and after scanning through a few tens of matching answers I gave up. So here is my problem.

I want to be able to find the first sequence of exactly six digits in a string. Given the string “Some text 987654321 and some more text 123456 and some other text again 654321 and more text in the end” I want to find the regex that will match the 123456 sequence.

I am new to regex and a short explanation about how it works will help a lot.

Thank you in advance

Will the six digit number always be the same? Will it always be separated by spaces? You may not need to use Regex at all if that's the case. I'm just curious because you didn't specify the nature of the six digit number. — Elliot Bonneville
– Elliot Bonneville, Commented Mar 9, 2012 at 2:08
I am interested in finding a sequence of exactly 6 digits, regardless of which ones they are. The sequence may be surrounded by any characters , including or not spaces. When I say any I mean it is possible to have any UTF-8 characters. Actually my searched string is in Traditional Chinese and I have no idea what that may be. It is important that if the six digits are part of a sequence with more than six digits that won’t produce any match. — Julian
– Julian, Commented Mar 9, 2012 at 2:16
That is not what I want. 987654 is part of a more than 6 digits sequence (987654321) and I want to exclude that. Hope it clarifies Thanks — Julian
– Julian, Commented Mar 9, 2012 at 2:19

ruakh · Accepted Answer · 2012-03-09 02:23:53Z

22

You can use the pattern (?<!\d)\d{6}(?!\d), which means "a string-position that is not preceded by a digit; followed by exactly six digits; followed by a string-position that is not followed by a digit". (The notation (?<!...), known as a negative lookbehind assertion, means "not preceded by ...". The notation (?!...), known as a negative lookahead assertion, means "not followed by ...". The notation \d means a digit. The notation {n} means "n times", so that e.g. \d{6} means "six digits".)

That could look like this:

final String number;
{
    final Matcher m = Pattern.compile("(?<!\\d)\\d{6}(?!\\d)").matcher(input);
    if(m.find())
        number = m.group(); // retrieve the matched substring
    else
        number = null; // no match found
}

Note: a previous version of this answer suggested the use of word boundaries, \b; but one of your comments suggests that the digits might be immediately preceded or followed by Traditional Chinese characters, which are considered word characters (and therefore wouldn't trigger a word boundary), so I've changed that.

edited Mar 9, 2012 at 2:23

answered Mar 9, 2012 at 2:15

ruakh

185k29 gold badges292 silver badges324 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

stema Over a year ago

\w, \b, ... are ASCII based in java (so your \b should have accidentally worked), you can correct this behaviour since Java 7 by using the flag UNICODE_CHARACTER_CLASS, see here

ruakh Over a year ago

@stema: In Java, although \w is ASCII-based by default, \b is Unicode-based. (Dunno why.)

Eric Over a year ago

I was looking for a solution that doesn't involve \b! You're my life saver!

tchrist · Accepted Answer · 2012-03-09 02:22:05Z

6

The pattern you’re looking for is:

(?x)              # enable comments
(?<! \p{Nd} )     # no decimal number before
\p{Nd} {6}        # exactly six repetitions of a decimal number
(?!= \p{Nd} )     # no decimal number after

That will also pick up things like

U+FF10 ‭ ０ FULLWIDTH DIGIT ZERO
U+FF11 ‭ １ FULLWIDTH DIGIT ONE
U+FF12 ‭ ２ FULLWIDTH DIGIT TWO
U+FF13 ‭ ３ FULLWIDTH DIGIT THREE
U+FF14 ‭ ４ FULLWIDTH DIGIT FOUR
U+FF15 ‭ ５ FULLWIDTH DIGIT FIVE
U+FF16 ‭ ６ FULLWIDTH DIGIT SIX
U+FF17 ‭ ７ FULLWIDTH DIGIT SEVEN
U+FF18 ‭ ８ FULLWIDTH DIGIT EIGHT
U+FF19 ‭ ９ FULLWIDTH DIGIT NINE

In case you have those in Chinese text.

answered Mar 9, 2012 at 2:22

tchrist

80.7k31 gold badges135 silver badges186 bronze badges

1 Comment

user166390 Over a year ago

Very nice: +1 for globalization support and not being bound to whitespace.

takteek · Accepted Answer · 2012-03-09 02:19:42Z

1

The first occurrence of 6 digits in the string you posted is actually 987654. If you mean the first occurrence of 6 digits surrounded by characters that are not digits, then this should work:

(?<!\d)(\d{6})(?!\d)

EDIT: This approach uses a negative lookbehind and a negative lookahead. It's slightly different than the word boundary approach in that it will match 123456 in the following strings

123456asdf some text hello

another string a123456 aaaaaaaa

If the numbers will always be surrounded by spaces then the word boundary approach is probably better.

answered Mar 9, 2012 at 2:19

takteek

7,1504 gold badges41 silver badges71 bronze badges

1 Comment

Julian Over a year ago

In my example I made it clear what I would like to match. Maybe the question was not quite clear. But your regex worked. Thank you very much.

kasavbere · Accepted Answer · 2012-03-09 02:31:05Z

1

 public static String splitting(String str, int num){
    String arr[] = str.split("[^0-9]");
    for(String s:arr)
        if(s.length() == num)
            return s;
    return null;
}

test with

 public static void main(String[] args) {
    String s =  "Some text 987654321 and some more text 123456 and some other text again 654321 and more text in the end";
    System.out.println(splitting(s, 6));
}

output is

answered Mar 9, 2012 at 2:31

kasavbere

6,02314 gold badges52 silver badges73 bronze badges

Comments

techraf · Accepted Answer · 2016-05-07 00:21:10Z

0

in Javascript console works this. Watch out for \\d:

replacedString = "rx14ax145N".replace(RegExp("x14(?!\\d)", "g"), "___");

r___ax145N

edited May 7, 2016 at 0:21

techraf

69.2k30 gold badges211 silver badges215 bronze badges

answered May 6, 2016 at 23:23

Stefan Varga

5187 silver badges7 bronze badges

Collectives™ on Stack Overflow

Java REGEX to match an exact number of digits in a string

5 Answers 5

3 Comments

1 Comment

1 Comment

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

3 Comments

1 Comment

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related