1

I'm trying to write a Java method that will take a string as a parameter and return another string if it matches a pattern, and null otherwise. The pattern:

  • Starts with a number (1+ digits); then followed by
  • A colon (":"); then followed by
  • A single whitespace (" "); then followed by
  • Any Java string of 1+ characters

Hence, some valid string thats match this pattern:

50: hello
1: d
10938484: 394958558

And some strings that do not match this pattern:

korfed49
: e4949
6
6:
6:sdjjd4

The general skeleton of the method is this:

public String extractNumber(String toMatch) {
    // If toMatch matches the pattern, extract the first number
    // (everything prior to the colon).

    // Else, return null.
}

Here's my best attempt so far, but I know I'm wrong:

public String extractNumber(String toMatch) {
    // If toMatch matches the pattern, extract the first number
    // (everything prior to the colon).
    String regex = "???";
    if(toMatch.matches(regex))
        return toMatch.substring(0, toMatch.indexOf(":"));

    // Else, return null.
    return null;
}

Thanks in advance.

0

2 Answers 2

4

Your description is spot on, now it just needs to be translated to a regex:

^      # Starts
\d+    # with a number (1+ digits); then followed by
:      # A colon (":"); then followed by
       # A single whitespace (" "); then followed by
\w+    # Any word character, one one more times
$      # (followed by the end of input)

Giving, in a Java string:

"^\\d+: \\w+$"

You also want to capture the numbers: put parentheses around \d+, use a Matcher, and capture group 1 if there is a match:

private static final Pattern PATTERN = Pattern.compile("^(\\d+): \\w+$");

// ...

public String extractNumber(String toMatch) {
    Matcher m = PATTERN.matcher(toMatch);
    return m.find() ? m.group(1) : null;
}

Note: in Java, \w only matches ASCII characters and digits (this is not the case for .NET languages for instance) and it will also match an underscore. If you don't want the underscore, you can use (Java specific syntax):

[\w&&[^_]]

instead of \w for the last part of the regex, giving:

"^(\\d+): [\\w&&[^_]]+$"
Sign up to request clarification or add additional context in comments.

4 Comments

@smit yes, given that the .matches() method is used -- I really loathe the name of this method, Java has made a mistake there
I dont understand what do you mean by that. Can you be more clear on that.
@smit: when you use .matches(), it is like if you surrounded the entire regex with ^ and $ -- contradicting the very definition of regex matching, which can happen anywhere in the input. Real regex matching in Java is done using .find().
I see what you mean. After reading java doc It get more clear. I think this link could be useful. matches and find
2

Try using the following: \d+: \w+

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.