0

I am a newbie and have been struggling the last hour to figure this out. Let's say you have these strings:

baa cec haw heef baas bat jackaay

I want to match all the words which don't have two aa's consecutively, so in the above it will match cec, haw, heef, bat.

This is what i have done so far, but it's completely wrong i can sense :D

\w*[^\s]*[^a\s]{2}[^\s]*\w*
9
  • 1
    So Javascript or Python or Perl? Commented Nov 7, 2015 at 11:26
  • Python: [s for s in myStrings if 'aa' not in s] Commented Nov 7, 2015 at 11:26
  • @mshsayem [s for s in myStrings.split() if 'aa' not in s] Commented Nov 7, 2015 at 11:28
  • @Kevin Guan: Changed just a sec before you posted the comment, :p. He said, strings which I interpreted a collection/list Commented Nov 7, 2015 at 11:28
  • 1
    It is regex in general, testing using regex101.com Commented Nov 7, 2015 at 11:42

3 Answers 3

1

You need a regex that has 2 things: a word boundary \b and a negative lookahead right after it (it will be sort of anchored that way) that will lay restrictions to the subpattern that follows.

\b(?!\w*aa)\w+

See the regex demo

Regex breakdown:

  • \b - word boundary
  • (?!\w*aa) - the negative lookahead that will cancel a match if the word has 0 or more word characters followed by two as
  • \w+ - 1 or more word characters.

Code demo:

var re = /\b(?!\w*aa)\w+/gi; 
var str = 'baa cec haw heef bAas bat jackaay bar ha aa lar';
var res = str.match(re);
document.write(JSON.stringify(res));

Sign up to request clarification or add additional context in comments.

2 Comments

Thank you so much. I will try hard to understand how this works, but it works and is exactly what i wanted to achieve.
Please see Word Boundaries and Lookahead and Lookbehind Zero-Length Assertions articles. The main pattern is \w(alphanumeric+underscore matching subpattern) that we match 1 or more times (+), only after a non-word character (not [a-zA-Z0-9_]) if it has no aa (since we check the word first with the lookahead if it has aa after zero or more word caracters (\w*)).
1

You maybe want to use negative lookahead:

/(^|\s)(?!\w*aa\w*)(\w+)/gi

You can check your string by paste this code on console on Chrome/Firefox (F12):

var pattern = /(^|\s)(?!\w*aa\w*)(\w+)/gi;
var str = 'baa cec haw heef baas bat jackaay';
while(match = pattern.exec(str))
    console.log(match[2]); // position 2 is (\w+) in regex

You can read more about lookahead here. See it on Regex101 to see how this regex work.

2 Comments

Thank you. This almost works. It just captures the spaces between words as well as part of the group.
If you select 2nd group, you will get expected string. But I think @stribizhev has better solution.
0

in javascript, you could use filter and regex invert ! a non-capturing group ?:.

var strings = ['baa','cec','haw','heef','baas','bat','jackaay'];
strings = $(strings).filter(function(index, element){
   return !/.*(?:aa).*/.test(element);                // regex => .*(?:aa).*
});

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.