Regex to not match a pattern in string

Question

I am a newbie and have been struggling the last hour to figure this out. Let's say you have these strings:

baa cec haw heef baas bat jackaay

I want to match all the words which don't have two aa's consecutively, so in the above it will match cec, haw, heef, bat.

This is what i have done so far, but it's completely wrong i can sense :D

\w*[^\s]*[^a\s]{2}[^\s]*\w*

@Kevin Guan: Changed just a sec before you posted the comment, :p. He said, strings which I interpreted a collection/list — mshsayem
– mshsayem, Commented Nov 7, 2015 at 11:28

Wiktor Stribiżew · Accepted Answer · 2015-11-07 16:01:10Z

1

You need a regex that has 2 things: a word boundary \b and a negative lookahead right after it (it will be sort of anchored that way) that will lay restrictions to the subpattern that follows.

\b(?!\w*aa)\w+

See the regex demo

Regex breakdown:

\b - word boundary
(?!\w*aa) - the negative lookahead that will cancel a match if the word has 0 or more word characters followed by two as
\w+ - 1 or more word characters.

Code demo:

var re = /\b(?!\w*aa)\w+/gi; 
var str = 'baa cec haw heef bAas bat jackaay bar ha aa lar';
var res = str.match(re);
document.write(JSON.stringify(res));

edited Nov 7, 2015 at 16:01

answered Nov 7, 2015 at 15:10

Wiktor Stribiżew

631k41 gold badges502 silver badges632 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Stefan Bratanov Over a year ago

Thank you so much. I will try hard to understand how this works, but it works and is exactly what i wanted to achieve.

Wiktor Stribiżew Over a year ago

Please see Word Boundaries and Lookahead and Lookbehind Zero-Length Assertions articles. The main pattern is \w(alphanumeric+underscore matching subpattern) that we match 1 or more times (+), only after a non-word character (not [a-zA-Z0-9_]) if it has no aa (since we check the word first with the lookahead if it has aa after zero or more word caracters (\w*)).

Community · Accepted Answer · 2017-05-23 10:27:17Z

1

You maybe want to use negative lookahead:

/(^|\s)(?!\w*aa\w*)(\w+)/gi

You can check your string by paste this code on console on Chrome/Firefox (F12):

var pattern = /(^|\s)(?!\w*aa\w*)(\w+)/gi;
var str = 'baa cec haw heef baas bat jackaay';
while(match = pattern.exec(str))
    console.log(match[2]); // position 2 is (\w+) in regex

You can read more about lookahead here. See it on Regex101 to see how this regex work.

edited May 23, 2017 at 10:27

CommunityBot

11 silver badge

answered Nov 7, 2015 at 12:32

Van M. Tran

1121 silver badge7 bronze badges

2 Comments

Stefan Bratanov Over a year ago

Thank you. This almost works. It just captures the spaces between words as well as part of the group.

Van M. Tran Over a year ago

If you select 2nd group, you will get expected string. But I think @stribizhev has better solution.

BenG · Accepted Answer · 2015-11-07 11:47:25Z

0

in javascript, you could use filter and regex invert ! a non-capturing group ?:.

var strings = ['baa','cec','haw','heef','baas','bat','jackaay'];
strings = $(strings).filter(function(index, element){
   return !/.*(?:aa).*/.test(element);                // regex => .*(?:aa).*
});

answered Nov 7, 2015 at 11:47

BenG

15.2k5 gold badges48 silver badges62 bronze badges

Collectives™ on Stack Overflow

Regex to not match a pattern in string

3 Answers 3

2 Comments

2 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related