Java Pattern regex search between strings

Question

Given the following strings (stringToTest):

G2:7JAPjGdnGy8jxR8[RQ:1,2]-G3:jRo6pN8ZW9aglYz[RQ:3,4]
G2:7JAPjGdnGy8jxR8[RQ:3,4]-G3:jRo6pN8ZW9aglYz[RQ:3,4]

And the Pattern:

Pattern p = Pattern.compile("G2:\\S+RQ:3,4");
if (p.matcher(stringToTest).find())
{
    // Match
}

For string 1 I DON'T want to match, because RQ:3,4 is associated with the G3 section, not G2, and I want string 2 to match, as RQ:3,4 is associated with G2 section.

The problem with the current regex is that it's searching too far and reaching the RQ:3,4 eventually in case 1 even though I don't want to consider past the G2 section.

It's also possible that the stringToTest might be (just one section):

G2:7JAPjGdnGy8jxR8[RQ:3,4]

The strings 7JAPjGdnGy8jxR8 and jRo6pN8ZW9aglYz are variable length hashes.

Can anyone help me with the correct regex to use, to start looking at G2 for RQ:3,4 but stopping if it reaches the end of the string or -G (the start of the next section).

Is the hyphen only possble in front of the next section? If yes, subtract - from \S: G2:[^\s-]*RQ:3,4. In a general case, you may use G2:(?:(?!-G)\S)*RQ:3,4, see the regex demo. (?:(?!-G)\S)* is a tempered greedy token that will match 0+ occurrences of a non-whitespace char that does not start a -G substring. — Wiktor Stribiżew
– Wiktor Stribiżew, Commented Aug 29, 2018 at 9:53
Yes Wiktor, the hyphen is only possible in front of the next section. — Steve Ford
– Steve Ford, Commented Aug 29, 2018 at 10:01
See my answer below with some explanations and above solutions. — Wiktor Stribiżew
– Wiktor Stribiżew, Commented Aug 29, 2018 at 10:08

anubhava · Accepted Answer · 2018-08-29 09:55:39Z

2

You may use this regex with a negative lookahead in between:

G2:(?:(?!G\d+:)\S)*RQ:3,4

RegEx Demo

RegEx Details:

G2:: Match literal text G2:
(?: Start a non-capture group
- (?!G\d+:): Assert that we don't have a G<digit>: ahead of us
- \S: Match a non-whitespace character
)*: End non-capture group. Match 0 or more of this
RQ:3,4: Match literal text RQ:3,4

In Java use this regex:

String re = "G2:(?:(?!G\\d+:)\\S)*RQ:3,4";

answered Aug 29, 2018 at 9:55

anubhava

790k67 gold badges603 silver badges671 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Steve Ford Over a year ago

Cheers, this is the solution I used in the end.

Wiktor Stribiżew · Accepted Answer · 2018-08-29 10:06:40Z

The problem is that \S matches any whitespace char and the regex engine parses the text from left to right. Once it finds G2: it grabs all non-whitespaces to the right (since \S* is a ghreedy subpattern) and then backtracks to find the rightmost occurrence of RQ:3,4.

In a general case, you may use

String regex = "G2:(?:(?!-G)\\S)*RQ:3,4";

See the regex demo. (?:(?!-G)\S)* is a tempered greedy token that will match 0+ occurrences of a non-whitespace char that does not start a -G substring.

If the hyphen is only possible in front of the next section, you may subtract - from \S:

String regex = "G2:[^\\s-]*RQ:3,4"; // using a negated character class
String regex = "G2:[\\S&&[^-]]*RQ:3,4"; // using character class subtraction

See this regex demo. [^\\s-]* will match 0 or more chars other than whitespace and -.

Julio · Accepted Answer · 2018-08-29 10:00:03Z

0

Try to use [^[] instead of \S in this regex: G2:[^[]*\[RQ:3,4

[^[] means any character but [

Demo

(considering that strings like this: G2:7JAP[jGd]nGy8[]R8[RQ:3,4] are not possible)

answered Aug 29, 2018 at 10:00

Julio

5,3161 gold badge16 silver badges46 bronze badges

Collectives™ on Stack Overflow

Java Pattern regex search between strings

3 Answers 3

1 Comment

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related