0

I need to replace a string containing a substring with another string. For example:

biography -> biography
biographical -> biography
biopic -> biography
bio-pic -> biography-pic
I watched a biographical movie -> I watched a biography movie

Here, all words on the left contain bio, so the whole word is replaced by biography. I am aware of string.replace() function, but it doesn't seem to work well here. I looked up regular expressions, but I'm not sure if re is the right library to solve the problem.

16
  • 1
    doesn't seem to work well here? What do you mean by that? Commented Dec 13, 2019 at 18:22
  • 1
    @SukumarRdjf Good question, I suspect that the problem is that it maps biography to biographygraphy Commented Dec 13, 2019 at 18:24
  • 2
    First split string to words then apply suggestion of @SubhrajyotiDas to each word. Commented Dec 13, 2019 at 18:25
  • 1
    @Sukumar-Rdjf I mean that I don't want to construct the dictionary consisting of all possible words containing bio, and then call string.replace(word, bio) for every such word Commented Dec 13, 2019 at 18:27
  • 1
    Are you certain that you won't have things like 'biology' which have nothing to do with 'biography' but nevertheless contain 'bio'? Commented Dec 13, 2019 at 18:30

4 Answers 4

1

Using Regex

import re

s = """
biography -> biography
biographical -> biography
biopic -> biography
bio-pic -> biography-pic
I watched a biographical movie -> I watched a biography movie
"""
x = re.sub(r'\b(bio\w*)', 'biography', s)
print(x)

Output

biography -> biography
biography -> biography
biography -> biography
biography-pic -> biography-pic
I watched a biography movie -> I watched a biography movie
Sign up to request clarification or add additional context in comments.

Comments

0
import re

search_string = 'bio'
replace_string = 'biography'
vals = ['biography', 'biographical', 'biopic', 'bio-pic', 'something else', 'bio pic', 'I watched a biographical movie']
altered = [re.sub(re.escape(search_string)+r'\w*',replace_string,val) for val in vals]
print(altered)

outputs

['biography', 'biography', 'biography', 'biography-pic', 'something else', 'biography pic', 'I watched a biography movie']

For the regex part, re.escape() can be used to turn a variable into a regular expression. I assumed your 'bio' search string will not be constant. The rest of it \w* means to match 0 or more (the *) of the preceding character. \w means word characters (a-z, A-Z, 0-9, and _). Since we're only matching word characters, it stops the match when a space is encountered.

Comments

0

Try regular expression to solve this problem. It will definitely. You can change regular expression according to your requirement. Here is an example code

import re
s = "biography biographical biopic bio-pic I watched a biographical movie"
replaced = re.sub('(bio[A-Za-z]*)', 'biography', s)
print (replaced )

Comments

0

One of the decisions:

import re

def f(s, pat, replace):
    pat = r'(\w*%s\w*)' % pat
    return re.sub(pat, "biography", s)

input = """
biography -> biography
biographical -> biography
biopic -> biography
bio-pic -> biography-pic
I watched a biographical movie -> I watched a biography movie
"""

c = f(input, "bio", "biography")
print(c)

Output:

biography -> biography
biography -> biography
biography -> biography
biography-pic -> biography-pic
I watched a biography movie -> I watched a biography movie

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.