2

I'm having trouble getting my regex to match all groups.

My regex: /^#(\\|?[^|]+\\|[^|]+\\|?)+?$/g

Test string: #something|somethingelse|morestuff|evenmorestuff

The matches I want are the things matches in pairs, as in: something|somethingelse and morestuff|evenmorestuff should be groups 1 and 2. But I can only get the last group to return.

My code looks like this (I'm using Javascript).

re = new RegExp('^#(\\|?[^|]+\\|[^|]+\\|?)+?$', 'g');
var matches = re.exec(window.location.hash);
console.log([matches[0], matches[1], matches[2]]);

matches[0] returns the whole string
matches[1] returns morestuff|evenmorestuff
matches[2] is undefined.

3
  • 1
    Do not use repeated capturing groups, use multiple matching and validate separately if necessary. Commented Mar 24, 2018 at 20:04
  • I'm not sure exactly what you mean, do you mean I should specify a finite amount of matches and not loop through them? Commented Mar 24, 2018 at 20:26
  • See my answer... Commented Mar 24, 2018 at 20:32

1 Answer 1

2

Your regex is an example of how repeated capturing group works: (ab)+ only captures the last occurrence of ab in an abababab string.

In your case, you may perform two steps: 1) validate the input string to make sure it follows the pattern you want, 2) extract parts from the string using a g based regex.

To validate the string you may use

/^#[^|]+\|[^|]+(?:\|[^|]+\|[^|]+)*$/

See the regex demo. It is basically your original regex but it is more efficient, has no capturing groups (we do not need them at this step), and it does not allow | at the start / end of the string (but you may add \|* after # and before $ if you need that).

Details

  • ^# - # at the start of the string
  • [^|]+ - 1+ chars other than |
  • \| - a |
  • [^|]+ - 1+ chars other than |
  • (?:\|[^|]+\|[^|]+)* - 0+ sequences of
    • \| - a | char
    • [^|]+\|[^|]+ - 1+ chars other than |, | and again 1+ chars other than |
  • $ - end of string.

To extract the pairs, you may use a simple /([^|]+)\|([^|]+)/ regex (the input will be the substring starting at Position 1).

Whole solution:

var s = "#something|somethingelse|morestuff|evenmorestuff";
var rx_validate = /^#[^|]+\|[^|]+(?:\|[^|]+\|[^|]+)*$/;
var rx_extract = /([^|]+)\|([^|]+)/g;
var m, result = [];
if (rx_validate.test(s)) {
  while (m=rx_extract.exec(s.substr(1))) {
    result.push([m[1], m[2]]);
  }
}
console.log(result);
// or just pairs as strings
// console.log(s.substr(1).match(rx_extract));
// => [ "something|somethingelse",  "morestuff|evenmorestuff" ]

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.