python regex repeated pattern

Question

I am looking for a regular expression to match:

[document n] m

in order to get rid of [document n] only when n=m

where n is any number

So [document 34] 34 will be a match but [document 34] 45 would not because the numbers are different

So far I have this:

import re
text = "[document 23] 23 and [document 34] 48 are white"
text = re.sub(r"(\[document \d+\] )(\d+)",r"\2. ",text)

But this does not assure thar the the numbers are equal.

Any idea?

d matches d, \d matches digits.

Wiktor Stribiżew
– Wiktor Stribiżew

2021-04-24 23:22:51 +00:00
Commented Apr 24, 2021 at 23:22 — Wiktor Stribiżew
– Wiktor Stribiżew, Commented Apr 24, 2021 at 23:22
\d+ instead of "+d\"?

SuperStormer
– SuperStormer

2021-04-24 23:24:36 +00:00
Commented Apr 24, 2021 at 23:24 — SuperStormer
– SuperStormer, Commented Apr 24, 2021 at 23:24

Wiktor Stribiżew · Accepted Answer · 2021-04-24 23:25:04Z

5

You can use

\[document\s+(\d+)]\s+\1(?!\d)

See the regex demo. Replace with \1. Details:

\[document - [document string
\s+ - one or more whitespaces
(\d+) - Group 1 (\1): one or more digits
] - a ] char
\s+ - one or more whitespaces
\1 - backreference to Group 1
(?!\d) - a negative lookahead that fails the match if there is a digit immediately to the right of the current location.

See the Python demo:

import re
text = "[document 23] 23 and [document 34] 48 are white [document 24] 240 text"
print( re.sub(r'\[document\s+(\d+)]\s+\1(?!\d)', r'\1', text) )
## => 23 and [document 34] 48 are white [document 24] 240 text

answered Apr 24, 2021 at 23:25

Wiktor Stribiżew

631k41 gold badges502 silver badges632 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

python regex repeated pattern

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related