1

I need to be able to return signed and unsigned integer constants with no intervening symbols, possibly preceded by + or -. The only allowed digits are 3, 4, and 5.

I can't figure out a way to say that the expression must not contain a period before or after the integer.

This is what I have so far, but if I pass say "34.5 - 43" the string returned will be: "34 5 43".

All that needs to be returned is "43".

public String getInts(String toBeScanned){

    String INT = "";
    Pattern p = Pattern.compile("\\b[+-]?[3-5]+\\b");
    Matcher m = p.matcher(toBeScanned);

    if (m.matches() == true){
        INT = toBeScanned;
    }
    else{
        m = p.matcher(" " + toBeScanned);
        while (m.find()){
        INT = INT + m.group() + " ";
        }
    }
    return INT;
}

Any thoughts or pushes in the right direction are appreciated. Is there a way to say it that the first and last character can be [\b and not .]

This is frustrating the heck out of me. Help!

2
  • "\\b?(?:([+-]?[3-5.]+) ?)+\\b?" and get the last group of the matcher? Commented Feb 25, 2012 at 20:40
  • Acdtually, \b is broken in Java patterns as a word boundary unless you either use 7-bit characters only not Java’s native charset, or else you use the Java 7 Pattern.UNICODE_CHARACTER_CLASS compilation optional or equivalent embedded "(?U)" flag. So either make sure you have 7-bit data, or else use the new flag, or else don’t use \b. Commented Feb 25, 2012 at 21:41

3 Answers 3

2

You don't want a word boundary \b here. I think the best is to create your own assertion, try this

(?<![.\d])[+-]?[3-5]+(?![.\d])

See it here on Regexr

(?<![.\d]) is a negative lookbehind assertion, it says before the pattern is no dot and no digit allowed.

(?![.\d]) is a negative lookahead assertion, it says after the pattern is no dot and no digit allowed.

Improvement

to avoid that it matches stuff like "hf34" we can make it more strict

(?<![.\w])[+-]?[3-5]+(?![.\w])

See it on Regexr

The word boundary \b

\b matches on a change from a word character to a non word character. A word character is a letter or a digit or a _. That means you will also get problems with your \b before the [+-], because there is no \b between a space/start of the string and a [+-].

Sign up to request clarification or add additional context in comments.

5 Comments

The problem is by saying (?<![.\d]) there is no dot and no digit allowed, but anything else is, so if " hf35 43" was passed, the string returned would be "35 43", when it should only return "43".
then just change the \d to \w. (?<![.\w])[+-]?[3-5]+(?![.\w]).
You sir, are a genius. This works beautifully. And Regexr is a nice tool, thanks for showing me. I would have never figured that out, so thanks a ton.
Remember that your definition of \b is only true of 7-bit data, and that it will misbehave on Unicode if you don’t use the new flag from Java 7. Before then Java Patterns were not usable with \b on the native charset.
@tchrist thanks for the info. I knew that there has been a lot improvements for regex in Java 7, but till now I was unaware of this flag.
0

"\b[+-]?[3-5]+[.][3-5]+\b"

This pattern says that in order to match, there must be at least one number before, and one number after the decimal point.

1 Comment

That's exactly what I don't want to do. If there is a "." preceding or following the number it needs to be ignored.
0

Is there a way to say it that the first and last character can be [\b and not .]

[^\.\b]

matches \b but not '.'

Is that what you are looking for?

[^\.\b][+-]?[3-5]+[^\.\b]

Will match '43' but not '34.5'

2 Comments

[^\.\b] mean it can be ^\. OR \b. So this won't work. I tried that already lol.
What do you think [\b] means? A boundary is a zero-width assertion, so that can’t be used in a charclass. Would you believe backspace?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.