1

I'm wanting to extract each block of alphanumeric characters that come after underscores in a Javascript string. I currently have it working using a combination of string methods and regex like so:

var string = "ignore_firstMatch_match2_thirdMatch";    
var firstValGone = string.substr(string.indexOf('_'));
// returns "_firstMatch_match2_thirdMatch"
var noUnderscore = firstValGone.match(/[^_]+/g);
// returns ["firstMatch", "match2" , "thirdMatch"]

I'm wondering if there's a way to do it purely using regex? Best I've managed is:

var string = "ignore_firstMatch_match2_thirdMatch";
var matchTry = string.match(/_[^_]+/g);
// returns ["_firstMatch", "_match2", "_thirdMatch"]

but that returns the preceding underscore too. Given you can't use lookbehinds in JS I don't know how to match the characters after, but exclude the underscore itself. Is this possible?

3
  • 2
    Just use a capture group _([^_]+) and use RegExp#exec in a loop. Commented Mar 24, 2016 at 20:39
  • stackoverflow.com/a/432503/4028085 Commented Mar 24, 2016 at 20:42
  • Cheers for all the replies. I hadn't tried looping through groups because I thought there might be a 'cleaner' way to do it, so thanks for letting me know that loops and groups are the way to go. Commented Mar 24, 2016 at 21:03

3 Answers 3

2

You can use a capture group (_([^_]+)) and use RegExp#exec in a loop while pushing the captured values into an array:

var re = /_([^_]+)/g; 
var str = 'ignore_firstMatch_match2_thirdMatch';
var res = [];
 
while ((m = re.exec(str)) !== null) {
    res.push(m[1]);
}
document.body.innerHTML = "<pre>" + JSON.stringify(res, 0, 4) + "</pre>";

Note that using a string#match() with a regex defined with a global modifier /g will lose all the captured texts, that's why you cannot just use str.match(/_([^_]+)/g).

Sign up to request clarification or add additional context in comments.

Comments

2

Since lookbehind is not supported in JS the only way I can think of is using a group like this.

Regex: _([^_]+) and capture group using \1 or $1.

Regex101 Demo

var myString = "ignore_firstMatch_match2_thirdMatch";
var myRegexp = /_([^_]+)/g;

match = myRegexp.exec(myString);
while (match != null) {
  document.getElementById("match").innerHTML += "<br>" + match[0];
  match = myRegexp.exec(myString);
}
<div id="match">

</div>


An alternate way using lookahead would be something like this.

But it takes long in JS. Killed my page thrice. Would make a good ReDoS exploit

Regex: (?=_([A-Za-z0-9]+)) and capture groups using \1 or $1.

Regex101 Demo

Comments

1

Why do you assume you need regex? a simple split will do the job:

string str = "ignore_firstMatch_match2_thirdMatch";
IEnumerable<string> matches = str.Split('_').Skip(1);

1 Comment

I'm not assuming I need one, I know I don't. I'm trying to get better at using regex.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.