6

I have a good regexp for replacing repeating characters in a string. But now I also need to replace repeating words, three or more word will be replaced by two words.

Like

bye! bye! bye!

should become

bye! bye!

My code so far:

def replaceThreeOrMoreCharachetrsWithTwoCharacters(string): 
     # pattern to look for three or more repetitions of any character, including newlines. 
     pattern = re.compile(r"(.)\1{2,}", re.DOTALL) 
     return pattern.sub(r"\1\1", string)
4
  • 1
    What is your regex for repeating characters? Commented Aug 24, 2014 at 17:27
  • and if you have bye! bye! bye! bye! bye! bye! what should be the output ? :) Commented Aug 24, 2014 at 17:27
  • @alfasin: bye! bye!: "three or more word will be replaced by two words" Commented Aug 24, 2014 at 17:30
  • This is for characters: def replaceThreeOrMoreCharachetrsWithTwoCharacters(string): pattern = re.compile(r"(.)\1{2,}", re.DOTALL) return pattern.sub(r"\1\1", string) Commented Aug 24, 2014 at 17:32

5 Answers 5

5

Assuming that what is called "word" in your requirements is one or more non-whitespaces characters surrounded by whitespaces or string limits, you can try this pattern:

re.sub(r'(?<!\S)((\S+)(?:\s+\2))(?:\s+\2)+(?!\S)', r'\1', s)
Sign up to request clarification or add additional context in comments.

4 Comments

Didn't work with this: s = "hi hi hi some words words words which'll repeat repeat repeat repeat repeat" print re.sub(r'(?<!\S)((\S+)(?:\s+\2)(?:\s+\2)+)(?!\S)', r'\1', s) same output.
@hjpotter92: (?!\S) doesn't require a character to be here, it will match the beginning/end of string while \s wouldn't.
@Bjorn: I have tested it and it works well with your string, you have perhaps missed something.
@hjpotter92: Yes, Robin is right (?!\S) is used as a kind of word boundary that checks if there is either a whitespace or the end of the string. The goal of this check is to avoid to match for example: hi hi hippopotamus. Furthermore, since it is not needed to match a possible character after the word, I choose to put it in a lookahead.
3

You could try the below regex also,

(?<= |^)(\S+)(?: \1){2,}(?= |$)

Sample code,

>>> import regex
>>> s = "hi hi hi hi some words words words which'll repeat repeat repeat repeat repeat"
>>> m = regex.sub(r'(?<= |^)(\S+)(?: \1){2,}(?= |$)', r'\1 \1', s)
>>> m
"hi hi some words words which'll repeat repeat"

DEMO

4 Comments

Be careful with the words boundaries: this will match the he hemisphere as well
Beware, it still matches he he hemisphere. Also here it seems that words are not just alphanum strings, adding \b makes the regex fail on the hi! hi! hi! first example.
Seems to have a problem with: ` s = "bye! bye! bye!" `
@AvinashRaj: it is still failing on he he hemisphere
2

I know you were after a regular expression but you could use a simple loop to achieve the same thing:

def max_repeats(s, max=2):
  last = ''
  out = []
  for word in s.split():
    same = 0 if word != last else same + 1
    if same < max: out.append(word)
    last = word
  return ' '.join(out)

As a bonus, I have allowed a different maximum number of repeats to be specified (the default is 2). If there is more than one space between each word, it will be lost. It's up to you whether you consider that to be a bug or a feature :)

Comments

0

Try the following:

import re
s = your string
s = re.sub( r'(\S+) (?:\1 ?){2,}', r'\1 \1', s )

You can see a sample code here: http://codepad.org/YyS9JCLO

1 Comment

Got a problem with this: s = "hi hi hi some words words words which'll repeat repeat repeat repeat repeat"
0
def replaceThreeOrMoreWordsWithTwoWords(string):
    # Pattern to look for three or more repetitions of any words.
    pattern = re.compile(r"(?<!\S)((\S+)(?:\s+\2))(?:\s+\2)+(?!\S)", re.DOTALL)
    return pattern.sub(r"\1", string)

1 Comment

While this code may answer the question, it is better to include an explanation of what it does, and why it is a better answer than the others offered. For regex answers, a demo using a site like regex101 can be extremely useful.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.