3

It seems simple, but I can't get it work.

I have a string which look like 'NNDDDDDAAAA', where 'N' is non digit, 'D' is digit, and 'A' is anything. I need to replace each A with a space character. Number of 'N's, 'D's, and 'A's in an input string is always different.

I know how to do it with two expressions. I can split a string in to two, and then replace everything in second group with spaces. Like this

    Pattern pattern = Pattern.compile("(\\D+\\d+)(.+)");
    Matcher matcher = pattern.matcher(input);
    if (matcher.matches()) {
        return matcher.group(1) + matcher.group(2).replaceAll(".", " ");
    }

But I was wondering if it is possible with a single regex expression.

3
  • 2
    How would you tell the difference between the 'A's and the others? Do 'A's always come after the 'D's? Commented Jul 8, 2009 at 22:18
  • 1
    How do you tell the difference between the last "D" and the first "A"? Is the group of "A"s guaranteed not to be a "D" type character? Commented Jul 8, 2009 at 23:00
  • Curtis Tasker is correct, first A after NNDDDD is always N, the rest is anything. Commented Jul 9, 2009 at 14:15

4 Answers 4

3

Given your description, I'm assuming that after the NNDDDDD portion, the first A will actually be a N rather than an A, since otherwise there's no solid boundary between the DDDDD and AAAA portions.

So, your string actually looks like NNDDDDDNAAA, and you want to replace the NAAA portion with spaces. Given this, the regex can be rewritten as such: (\\D+\\d+)(\\D.+)

Positive lookbehind in Java requires a fixed length pattern; You can't use the + or * patterns. You can instead use the curly braces and specify a maximum length. For instance, you can use {1,9} in place of each +, and it will match between 1 and 9 characters: (?<=\\D{1,9}\\d{1,9})(\\D.+)

The only problem here is you're matching the NAAA sequence as a single match, so using "NNNDDDDNAAA".replaceAll("(?<=\\D{1,9}\\d{1,9})(\\D.+)", " ") will result in replacing the entire NAAA sequence with a single space, rather than multiple spaces.

You could take the beginning delimiter of the match, and the string length, and use that to append the correct number of spaces, but I don't see the point. I think you're better off with your original solution; Its simple and easy to follow.

If you're looking for a little extra speed, you could compile your Pattern outside the function, and use StringBuilder or StringBuffer to create your output. If you're building a large String out of all these NNDDDDDAAAAA elements, work entirely in StringBuilder until you're done appending.

class Test {

public static Pattern p = Pattern.compile("(\\D+\\d+)(\\D.+)");

public static StringBuffer replace( String input ) {
    StringBuffer output = new StringBuffer();
    Matcher m = Test.p.matcher(input);
    if( m.matches() )
        output.append( m.group(1) ).append( m.group(2).replaceAll("."," ") );

    return output;
}

public static void main( String[] args ) {
    String input = args[0];
    long startTime;

    StringBuffer tests = new StringBuffer();
    startTime = System.currentTimeMillis();
        for( int i = 0; i < 50; i++)
        {
            tests.append( "Input -> Output: '" );
            tests.append( input );
            tests.append( "' -> '" );
            tests.append( Test.replace( input ) );
            tests.append( "'\n" );
        }
    System.out.println( tests.toString() );
    System.out.println( "\n" + (System.currentTimeMillis()-startTime));
}

}

Update: I wrote a quick iterative solution, and ran some random data through both. The iterative solution is around 4-5x faster.

public static StringBuffer replace( String input )
{
    StringBuffer output = new StringBuffer();
    boolean second = false, third = false;
    for( int i = 0; i < input.length(); i++ )
    {
        if( !second && Character.isDigit(input.charAt(i)) )
            second = true;

        if( second && !third && Character.isLetter(input.charAt(i)) )
            third = true;

        if( second && third )
            output.append( ' ' );
        else
            output.append( input.charAt(i) );

    }

    return output;
}
Sign up to request clarification or add additional context in comments.

Comments

1

what do you mean by nondigit vs anything?

[^a-zA-Z0-9]
matches everything that is not a letter or digit.

you would want to replace anything that gets matched by the above regex with a space.

is this what you were talking about?

3 Comments

Don't you mean /[^a-zA-Z0-9]/ /g ?
that would delete the "anything" matches, I just wanted to throw the regex up that actually matches "anything" I will take the slashes out to clear things up. Thanks.
'anything' means anything, i.e. letters, digits, whitespace. I want replace each occurrence with a space. For instance, 'AA12345d4 %' would be replaced with 'AA12345 ' (four spaces at the end)
1

You want to use positive look behind to match the N's and D's then use a normal match for the A's.

Not sure of the positive look behind grammar in Java, but some article on Java regex with look behind

5 Comments

I was just about to post that ... honest! Don't know if you are allowed to have a variable length look behind pattern though eg (?<=\D+)
Not sure, about the Java regex: I've read some articles talking about pos/neg look ahead/behind restrictions in the three major variants of regex engines and the main take away I had was the the .Net regex could do the good stuff, but sometimes just because it can doesn't mean you should.
Here's a nice description of various engines' support for look behind: regular-expressions.info/lookaround.html#limitbehind
No, in general they do not allow variable-width look behind. "(?<=\D+)" is allowed because it is equivalent to the fixed-width look behind "(?<=\D)"
And in any case, even if a look behind worked, it would not solve the OP's problem, which is to replace every character in the matched group with a space. There is no replacement string that will allow you to perform "replace this with a string of spaces of the same length".
0

I know you asked for a regex, but why do you even need a regex for this? How about:

StringBuilder sb = new StringBuilder(inputString);
for (int i = sb.length() - 1; i >= 0; i--) {
    if (Character.isDigit(sb.charAt(i)))
        break;
    sb.setCharAt(i, ' ');
}
String output = sb.toString();

You might find this post interesting. Of course, the above code assumes there will be at least one digit in the string - all characters following the last digit are converted to spaces. If there are no digits, every character is converted to a space.

4 Comments

I think you are right. I was refactoring some old code which has multiple loops and indexOf()/substring() and I thought it could be done with a simple regex. Didn't even think about cleaning up the old logic. I think your approach would be the most efficient for this task. Thanks for thinking outside the box, i.e. my initial requirements.
Your code assumes that the AAA portion will be non-digits. This is contrary to the problem description, which says that A will be 'anything', which could include digits.
Well then, the solution can be slightly adapted to locate the point where a digit is followed by a non-digit. It still ends up being simpler than using regexes where they're not really necessary.
yes, I had to add additional logic to find the point where digits are allowed. Still pretty simple

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.