2

I am trying to write a regular expression for these find of strings

05 IMA-POLICY-ID         PIC X(15).               00020068

05 (AMENT)-GROUPCD       PIC X(10).

I want to parse anything between 05 and first tab . The line might start with tabs or spaces and then digit Initial number can be anything 05,10,15 .

So In the first line I need to pasrse IMA-POLICY-ID and in second line (AMENT)-GROUPCD

This is the code i have written and its not finding the pattern where am i going wrong ?

Pattern p1 = Pattern.compile("^[0-9]+\\s\\S+\t$"); 
Matcher m1 = p1.matcher(line); 
System.out.println("m1 =="+m1.group());
3
  • And remove the anchor. Commented Mar 18, 2014 at 9:34
  • 1
    @devnull: \t should also work (it matches the literal tab character instead of the tab metacharacter, but that works identically). Commented Mar 18, 2014 at 9:35
  • 1
    Hi , the line might start with tabs or spaces and then digit Commented Mar 18, 2014 at 11:16

6 Answers 6

3
Pattern p1 = Pattern.compile("\\b(?:05|1[05])\\b[^\\t]*\\t"); 

will match anything from 05, 10 or 15 until the nearest \t.

Explanation:

\b           # Start of number/word
(?:05|1[05]) # Match 05, 10 or 15
\b           # End of number/word
[^\t]*       # Match any number of characters except tab
\t           # Match a tab
Sign up to request clarification or add additional context in comments.

Comments

2
^\d+\s+([^\s]+)

this will match your requirement

demo here : http://regex101.com/r/rQ7fT3

Comments

2

Your regex is almost correct. Just remove the \t$ at the end of your regex. and capture the \\S+ as a group.

Pattern p1 = Pattern.compile("^[0-9]+\\s(\\S+)");

Now print it as:

if (m.find( )) {
    System.out.println(m.group(1));
}

Comments

2

Your pattern expects the line to end after IMA-POLICY-ID etc, because of the $ at the end.

If there is no white space in the string you want to match (I assume there isn't because of your use of \S+, I'd change the pattern to ^\d+\s+(\S+) which should be sufficient to match any number at the start of a line, followed by whitespace and then the group of non-whitespace characters you want to match (note that a tab is whitespace as well).

If you need to match until the first tab or the end of the input and include other whitespace, replace (\S+) with ([^\t]+).

Comments

2

I can see two things that might prevent your Pattern from working.

  1. Firstly your input Strings contain multiple tab-separated values, therefore the $ "end-of-input" character at the end of your Pattern will fail to match the String
  2. Secondly, you want to find what's in between 05 (etc.) and the 1st tab. Therefore you need to wrap your desired expression between parenthesis (e.g. (\\S+)) and refer it by its group number (in this case, it would be group 1)

Here's an example:

String input = "05 IMA-POLICY-ID\tPIC X(15).\t00020068" +
                "\r\n05 (AMENT)-GROUPCD\tPIC X(10).";
//                           | 0, 1, or 5 twice (refine here if needed)
//                           |       | 1 whitespace
//                           |       |  | your queried expression (here I use a 
//                           |       |  | reluctant dot search
//                           |       |  |    | tab
//                           |       |  |    |  | anything after, reluctant
Pattern p = Pattern.compile("[015]{2}\\s(.+?)\t.+?");
Matcher m = p.matcher(input);
while (m.find()) {
    System.out.println("Found: " + m.group(1));
}

Output

Found: IMA-POLICY-ID
Found: (AMENT)-GROUPCD

Comments

1

This is what i came up with and it worked :

String re = "^\\s+\\d+\\s+([^\\s]+)";
Pattern p1 = Pattern.compile(re, Pattern.MULTILINE); 
Matcher m1 = p1.matcher(line);

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.