3

I am looking for a regular expression to match:

[document n] m

in order to get rid of [document n] only when n=m

where n is any number

So [document 34] 34 will be a match but [document 34] 45 would not because the numbers are different

So far I have this:

import re
text = "[document 23] 23 and [document 34] 48 are white"
text = re.sub(r"(\[document \d+\] )(\d+)",r"\2. ",text)

But this does not assure thar the the numbers are equal.

Any idea?

2
  • d matches d, \d matches digits. Commented Apr 24, 2021 at 23:22
  • \d+ instead of "+d\"? Commented Apr 24, 2021 at 23:24

1 Answer 1

5

You can use

\[document\s+(\d+)]\s+\1(?!\d)

See the regex demo. Replace with \1. Details:

  • \[document - [document string
  • \s+ - one or more whitespaces
  • (\d+) - Group 1 (\1): one or more digits
  • ] - a ] char
  • \s+ - one or more whitespaces
  • \1 - backreference to Group 1
  • (?!\d) - a negative lookahead that fails the match if there is a digit immediately to the right of the current location.

See the Python demo:

import re
text = "[document 23] 23 and [document 34] 48 are white [document 24] 240 text"
print( re.sub(r'\[document\s+(\d+)]\s+\1(?!\d)', r'\1', text) )
## => 23 and [document 34] 48 are white [document 24] 240 text
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.