Regex to replace part of the string with spaces

Question

It seems simple, but I can't get it work.

I have a string which look like 'NNDDDDDAAAA', where 'N' is non digit, 'D' is digit, and 'A' is anything. I need to replace each A with a space character. Number of 'N's, 'D's, and 'A's in an input string is always different.

I know how to do it with two expressions. I can split a string in to two, and then replace everything in second group with spaces. Like this

    Pattern pattern = Pattern.compile("(\\D+\\d+)(.+)");
    Matcher matcher = pattern.matcher(input);
    if (matcher.matches()) {
        return matcher.group(1) + matcher.group(2).replaceAll(".", " ");
    }

But I was wondering if it is possible with a single regex expression.

How would you tell the difference between the 'A's and the others? Do 'A's always come after the 'D's? — BryanH
– BryanH, Commented Jul 8, 2009 at 22:18
How do you tell the difference between the last "D" and the first "A"? Is the group of "A"s guaranteed not to be a "D" type character? — Jason Musgrove
– Jason Musgrove, Commented Jul 8, 2009 at 23:00
Curtis Tasker is correct, first A after NNDDDD is always N, the rest is anything. — user135273
– user135273, Commented Jul 9, 2009 at 14:15

Curtis Tasker · Accepted Answer · 2009-07-09 21:21:44Z

Given your description, I'm assuming that after the NNDDDDD portion, the first A will actually be a N rather than an A, since otherwise there's no solid boundary between the DDDDD and AAAA portions.

So, your string actually looks like NNDDDDDNAAA, and you want to replace the NAAA portion with spaces. Given this, the regex can be rewritten as such: (\\D+\\d+)(\\D.+)

Positive lookbehind in Java requires a fixed length pattern; You can't use the + or * patterns. You can instead use the curly braces and specify a maximum length. For instance, you can use {1,9} in place of each +, and it will match between 1 and 9 characters: (?<=\\D{1,9}\\d{1,9})(\\D.+)

The only problem here is you're matching the NAAA sequence as a single match, so using "NNNDDDDNAAA".replaceAll("(?<=\\D{1,9}\\d{1,9})(\\D.+)", " ") will result in replacing the entire NAAA sequence with a single space, rather than multiple spaces.

You could take the beginning delimiter of the match, and the string length, and use that to append the correct number of spaces, but I don't see the point. I think you're better off with your original solution; Its simple and easy to follow.

If you're looking for a little extra speed, you could compile your Pattern outside the function, and use StringBuilder or StringBuffer to create your output. If you're building a large String out of all these NNDDDDDAAAAA elements, work entirely in StringBuilder until you're done appending.

class Test {

public static Pattern p = Pattern.compile("(\\D+\\d+)(\\D.+)");

public static StringBuffer replace( String input ) {
    StringBuffer output = new StringBuffer();
    Matcher m = Test.p.matcher(input);
    if( m.matches() )
        output.append( m.group(1) ).append( m.group(2).replaceAll("."," ") );

    return output;
}

public static void main( String[] args ) {
    String input = args[0];
    long startTime;

    StringBuffer tests = new StringBuffer();
    startTime = System.currentTimeMillis();
        for( int i = 0; i < 50; i++)
        {
            tests.append( "Input -> Output: '" );
            tests.append( input );
            tests.append( "' -> '" );
            tests.append( Test.replace( input ) );
            tests.append( "'\n" );
        }
    System.out.println( tests.toString() );
    System.out.println( "\n" + (System.currentTimeMillis()-startTime));
}

}

Update: I wrote a quick iterative solution, and ran some random data through both. The iterative solution is around 4-5x faster.

public static StringBuffer replace( String input )
{
    StringBuffer output = new StringBuffer();
    boolean second = false, third = false;
    for( int i = 0; i < input.length(); i++ )
    {
        if( !second && Character.isDigit(input.charAt(i)) )
            second = true;

        if( second && !third && Character.isLetter(input.charAt(i)) )
            third = true;

        if( second && third )
            output.append( ' ' );
        else
            output.append( input.charAt(i) );

    }

    return output;
}

Robert Greiner · Accepted Answer · 2009-07-08 22:17:28Z

1

what do you mean by nondigit vs anything?

[^a-zA-Z0-9]
matches everything that is not a letter or digit.

you would want to replace anything that gets matched by the above regex with a space.

is this what you were talking about?

answered Jul 8, 2009 at 22:17

Robert Greiner

29.9k9 gold badges67 silver badges86 bronze badges

3 Comments

BryanH Over a year ago

Don't you mean /[^a-zA-Z0-9]/ /g ?

Robert Greiner Over a year ago

that would delete the "anything" matches, I just wanted to throw the regex up that actually matches "anything" I will take the slashes out to clear things up. Thanks.

user135273 Over a year ago

'anything' means anything, i.e. letters, digits, whitespace. I want replace each occurrence with a space. For instance, 'AA12345d4 %' would be replaced with 'AA12345 ' (four spaces at the end)

Simeon Pilgrim · Accepted Answer · 2009-07-08 22:39:16Z

1

You want to use positive look behind to match the N's and D's then use a normal match for the A's.

Not sure of the positive look behind grammar in Java, but some article on Java regex with look behind

answered Jul 8, 2009 at 22:39

Simeon Pilgrim

26.7k3 gold badges36 silver badges52 bronze badges

5 Comments

Amal Sirisena Over a year ago

I was just about to post that ... honest! Don't know if you are allowed to have a variable length look behind pattern though eg (?<=\D+)

Simeon Pilgrim Over a year ago

Not sure, about the Java regex: I've read some articles talking about pos/neg look ahead/behind restrictions in the three major variants of regex engines and the main take away I had was the the .Net regex could do the good stuff, but sometimes just because it can doesn't mean you should.

laz Over a year ago

Here's a nice description of various engines' support for look behind: regular-expressions.info/lookaround.html#limitbehind

newacct Over a year ago

No, in general they do not allow variable-width look behind. "(?<=\D+)" is allowed because it is equivalent to the fixed-width look behind "(?<=\D)"

newacct Over a year ago

And in any case, even if a look behind worked, it would not solve the OP's problem, which is to replace every character in the matched group with a space. There is no replacement string that will allow you to perform "replace this with a string of spaces of the same length".

Vinay Sajip · Accepted Answer · 2009-07-09 05:40:23Z

0

I know you asked for a regex, but why do you even need a regex for this? How about:

StringBuilder sb = new StringBuilder(inputString);
for (int i = sb.length() - 1; i >= 0; i--) {
    if (Character.isDigit(sb.charAt(i)))
        break;
    sb.setCharAt(i, ' ');
}
String output = sb.toString();

You might find this post interesting. Of course, the above code assumes there will be at least one digit in the string - all characters following the last digit are converted to spaces. If there are no digits, every character is converted to a space.

edited Jul 9, 2009 at 5:40

answered Jul 9, 2009 at 5:35

Vinay Sajip

100k15 gold badges184 silver badges196 bronze badges

4 Comments

user135273 Over a year ago

I think you are right. I was refactoring some old code which has multiple loops and indexOf()/substring() and I thought it could be done with a simple regex. Didn't even think about cleaning up the old logic. I think your approach would be the most efficient for this task. Thanks for thinking outside the box, i.e. my initial requirements.

Curtis Tasker Over a year ago

Your code assumes that the AAA portion will be non-digits. This is contrary to the problem description, which says that A will be 'anything', which could include digits.

Vinay Sajip Over a year ago

Well then, the solution can be slightly adapted to locate the point where a digit is followed by a non-digit. It still ends up being simpler than using regexes where they're not really necessary.

user135273 Over a year ago

yes, I had to add additional logic to find the point where digits are allowed. Still pretty simple

Collectives™ on Stack Overflow

Regex to replace part of the string with spaces

4 Answers 4

Comments

3 Comments

5 Comments

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

3 Comments

5 Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related