1

Hey I have code like this

var text = "We are downing to earth"
var regexes = "earth|art|ear"
if (regexes.length) {
    var reg = new RegExp(regexes, "ig");
    console.log(reg)
    while ((regsult = reg.exec(text)) !== null) {
      var word = regsult[0];
      console.log(word)
    }
  }

I want to get matching words from text. It should have "earth", "art" and "ear" as well. Because "earth" consist of those substring. Instead, it only produce "earth".

Is there any mistake with my regex pattern? Or should I use another approach in JS?

Thanks

2
  • yes, it should be like that Commented Oct 28, 2015 at 5:47
  • I think your original problem is to find matching words in JavaScript. If so you can may break your RegExp into n different RegExp (where n is the number of words) and then run each RegExp on the sentence to know the words which are present in the text. Also if you are looking for exact match for words you can use simple string match with indexOf rather than RegExp. Eg fiddle: jsfiddle.net/vtxk3zdn Commented Oct 28, 2015 at 6:18

2 Answers 2

2

As discussed in another answer, a single regexp cannot match multiple overlapping alternatives. In your case, simply do a separate regexp test for each word you are looking for:

var text = "We are downing to earth"
var regexes = ["earth", "art", "ear"];

var results = [];
for (var i = 0; i < regexes.length; i++ ) {
  var word = regexes[i];
  if (text.match(word) results.push(word);
}

You could tighten this up a little bit by doing

regexes . filter(function(word) { return (text.match(word) || [])[0]; });

If your "regexes" are actually just strings, you could just use indexOf and keep things simpler:

regexes . filter(function(word) { return text.indexOf(word) !== -1; });
Sign up to request clarification or add additional context in comments.

2 Comments

This is of course much more sensible than trying to construct a wildly nested regex.
okay, I use str.indexOf to accomplish this. Thanks for your explanation :)
1

You only get earth as a match because the regex engine has matched earth as the first alternative and then moved on in the source string, oblivious to the fact that you could also have matched ear or art. This is expected behavior with all regex engines - they don't try to return all possible matches, just the first one, and matches generally can't overlap.

Whether earth or ear is returned depends on the regex engine. A POSIX ERE engine will always return the leftmost, longest match, whereas most current regex engines (including JavaScript's) will return the first possible match, depending on the order of alternation in the regex.

So art|earth|ear would return earth, whereas ear|art|earth would return ear.

You can make the regex find overlapping matches (as long as they start in different positions in the string) by using positive lookahead assertions:

(?=(ear|earth|art))

will find ear and art, but not earth because it starts at the same position as ear. Note that you must not look for the regex' entire match (regsult[0] in your code) in this case but for the content of the capturing group, in this case (regsult[1]).

The only way around this that I can think of at the moment would be to use

(?=(ear(th)?|art))

which would have a result like [["", "ear", "th"], ["", "art", undefined]].

1 Comment

wooh, thanks and a concise explanation. But i should go with "str.indexOf" instead :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.