2

I wanted to replace certain pattern(space between alphabet) multiple times in a line.
Here's my code :

s = re.sub('([a-z]) ([a-z])', '\g<1>_\g<2>', 'series m coupe')

I expected to replace 'series m coupe' to 'series_m_coupe', but what I got is 'series_m coupe'. Even I put count=0, it didn't work...

I guess it's because "m" is 1 syllable. when I put more than 1 syllable like 'series mini coupe', that worked :

s = re.sub('([a-z]) ([a-z])', '\g<1>_\g<2>', 'series mini coupe')
s
'series_mini_coupe'

1 Answer 1

1

When you use ([a-z]) ([a-z]), the s m is matched in series m coupe and the regex index is after m. So, the regex engine is looking for a second match after that letter, and can't find any.

You need to use a lookahead to match overlapping strings:

s = re.sub('([a-z]) (?=[a-z])', '\g<1>_', 'series m coupe')
                    ^^^     ^

See the regex demo

The (?=[a-z]) lookahead will check if the space is followed with a lowercase ASCII letter, but will not consume it. In the replacement pattern, the \g<2> should be removed as there is no longer the second capturing group.

Sign up to request clarification or add additional context in comments.

1 Comment

Wow!! What a smart, wonderful code!! Thanks, I had no idea about "lookahead". It really helpful :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.