How to check multiple matching words with regex in Javascript

Question

Hey I have code like this

var text = "We are downing to earth"
var regexes = "earth|art|ear"
if (regexes.length) {
    var reg = new RegExp(regexes, "ig");
    console.log(reg)
    while ((regsult = reg.exec(text)) !== null) {
      var word = regsult[0];
      console.log(word)
    }
  }

I want to get matching words from text. It should have "earth", "art" and "ear" as well. Because "earth" consist of those substring. Instead, it only produce "earth".

Is there any mistake with my regex pattern? Or should I use another approach in JS?

Thanks

I think your original problem is to find matching words in JavaScript. If so you can may break your RegExp into n different RegExp (where n is the number of words) and then run each RegExp on the sentence to know the words which are present in the text. Also if you are looking for exact match for words you can use simple string match with indexOf rather than RegExp. Eg fiddle: jsfiddle.net/vtxk3zdn — Abhas Tandon
– Abhas Tandon, Commented Oct 28, 2015 at 6:18

user663031 · Accepted Answer · 2015-10-28 06:27:49Z

2

As discussed in another answer, a single regexp cannot match multiple overlapping alternatives. In your case, simply do a separate regexp test for each word you are looking for:

var text = "We are downing to earth"
var regexes = ["earth", "art", "ear"];

var results = [];
for (var i = 0; i < regexes.length; i++ ) {
  var word = regexes[i];
  if (text.match(word) results.push(word);
}

You could tighten this up a little bit by doing

regexes . filter(function(word) { return (text.match(word) || [])[0]; });

If your "regexes" are actually just strings, you could just use indexOf and keep things simpler:

regexes . filter(function(word) { return text.indexOf(word) !== -1; });

answered Oct 28, 2015 at 6:27

user663031

Sign up to request clarification or add additional context in comments.

2 Comments

Tim Pietzcker Over a year ago

This is of course much more sensible than trying to construct a wildly nested regex.

ans4175 Over a year ago

okay, I use str.indexOf to accomplish this. Thanks for your explanation :)

Tim Pietzcker · Accepted Answer · 2015-10-28 06:11:34Z

1

You only get earth as a match because the regex engine has matched earth as the first alternative and then moved on in the source string, oblivious to the fact that you could also have matched ear or art. This is expected behavior with all regex engines - they don't try to return all possible matches, just the first one, and matches generally can't overlap.

Whether earth or ear is returned depends on the regex engine. A POSIX ERE engine will always return the leftmost, longest match, whereas most current regex engines (including JavaScript's) will return the first possible match, depending on the order of alternation in the regex.

So art|earth|ear would return earth, whereas ear|art|earth would return ear.

You can make the regex find overlapping matches (as long as they start in different positions in the string) by using positive lookahead assertions:

(?=(ear|earth|art))

will find ear and art, but not earth because it starts at the same position as ear. Note that you must not look for the regex' entire match (regsult[0] in your code) in this case but for the content of the capturing group, in this case (regsult[1]).

The only way around this that I can think of at the moment would be to use

(?=(ear(th)?|art))

which would have a result like [["", "ear", "th"], ["", "art", undefined]].

edited Oct 28, 2015 at 6:11

answered Oct 28, 2015 at 6:06

Tim Pietzcker

337k59 gold badges520 silver badges572 bronze badges

1 Comment

ans4175 Over a year ago

wooh, thanks and a concise explanation. But i should go with "str.indexOf" instead :)

Collectives™ on Stack Overflow

How to check multiple matching words with regex in Javascript

2 Answers 2

2 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related