1

How can I identify strings containing more digits than non-digits using regular expression (Pattern) in Java? Thank you.

6 Answers 6

12

That's not a regular language, and thus it cannot be captured by a vanilla regex. It may be possible anyway, but it will almost certainly be easier not to use a regex:

public static boolean moreDigitsThanNonDigits(String s) {
    int diff = 0;
    for(int i = 0; i < s.length(); ++i) {
        if(Character.isDigit(s.charAt(i))) ++diff;
        else --diff;
    }
    return diff > 0;
}
Sign up to request clarification or add additional context in comments.

Comments

10

You won't be able to write a regexp that does this. But you already said you're using Java, why not mix in a little code?

public boolean moreDigitsThanNonDigits(String input) {
    String nonDigits = input.replace("[0-9]","");
    return input.length() > (nonDigits.length * 2);
}

2 Comments

Hi, Can you please clarify my doubt, by using java.util.regex pacakgae, will I be able to search for the any kind of pattern in the text files or in any kind of file format?
The various java.lang.String replace... methods can accept regex already. No need to bring in your own Pattern, Matcher, etc. instances.
3

Regular expressions are conceptually not able to preform such a task. They are equivalent to formal languages or (regular) automatons. They have no notion of memory (or a stack), so they cannot count the occurences of symbols. The next extension in terms of expressiveness are push-down automatons (or stack machines), which correspond to context free grammars. Before writing such a grammer for this task, using a method like the moreDigitsThanNonDigits above would be appropriate.

2 Comments

Perl- (and Java-) style regular expressions are actually more powerful than regular languages, because of the "\number" syntax for backtracking on a captured group. They can recognize languages that are not regular. For example, the language of any string repeated twice (which is not regular, nor even context-free) can be recognized by "(.*)\1".
Thanks for pointing this out! Your example would be "(.*)\1\1", right? But length comparisons are still not possible, I would assume.
1

As already mentioned the language in question is not regular and cannot be detected using a regular expression.

I'll give you one more way of counting the number of digits and number of non-digits in a string using regex!!

You can use the String.replaceAll method to delete all non-digits in the input string. The length of the resultant string will be the number of digits in the input.

Similarly you can delete all the digits in the input string and the length of the resultant string will be the number of non-digits in the input string.

public static boolean test(String str) {
         int numDigits = str.replaceAll("\\D","").length();
         int numNonDigits = str.replaceAll("\\d","").length();

         return numDigits > numNonDigits;
}

Ideone Link

Comments

0

I'm not sure that using regular expressions would be the best solution here.

1 Comment

I do not insist on using regular expression, I need to identify those strings somehow.
0

regex alone can't (since they don't count anything); but if you want to use them then just use two replacements: one that strips out all the digits and one that only keeps them. then compare string lengths of the results.

of course, i'd rather use Dave's answer.

3 Comments

Hi, Can you please clarify my doubt, by using java.util.regex pacakgae, will I be able to search for the any kind of pattern in the text files or in any kind of file format?
Since regular expressions are used for comparing the patterns in a string, Then my doubt whether google searches the patterns concept to search in all the files?
Google does FAR more than simply compare regex to some files.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.