Editing a text file using python

Question

I have an auto generated bibliography file which stores my references. The citekey in the generated file is of the form xxxxx:2009tb. Is there a way to make the program to detect such a pattern and change the citekey form to xxxxx:2009?

Use python regexp, read the file line by line and get the string, then replace it tutorialspoint.com/python/string_replace.htm — linello
– linello, Commented Nov 7, 2012 at 11:30
It's difficult to get a pattern from only one example. Could you post, say, five to ten different different occurrences of these references as they appear, and the corresponding desired outputs. — heltonbiker
– heltonbiker, Commented Nov 7, 2012 at 12:02

RParadox · Accepted Answer · 2012-11-07 12:14:02Z

1

It's not quite clear to me which expression you want to match, but you can build everything with regex, using import re and re.sub as shown. [0-9]*4 matches exactly 4 numbers. (Edit, to incorporate suggestions)

import re                                                                                                                                                                                          

inf = 'temp.txt'                                                                                      
outf = 'out.txt'                                                                                      

with open(inf) as f,open(outf,'w') as o:                                                              
    all = f.read()                                                                                    
    all = re.sub("xxxxx:[0-9]*4tb","xxxxx:tb",all) # match your regex here                                                  
    o.write(all)                                                                                      
    o.close()

edited Nov 7, 2012 at 12:14

answered Nov 7, 2012 at 11:37

RParadox

6,9314 gold badges26 silver badges35 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Mark Over a year ago

Why split the file into lines? If you are taking this approach you may as well do it for the full file.

Yevgen Yampolskiy Over a year ago

user996018 probably wants to capture (xxxxxx), not replace it. RParadox, use with when dealing with files instead of open/close.

heltonbiker Over a year ago

The OP obviously doesn't want to replace the hardcoded string xxxxx:2009tb, but actually a PATTERN containing some (undefined) string followed by a colon and a year date and some letters.

heltonbiker Over a year ago

Now he wants to REMOVE tb and keep the date (2009 in the example), and note remove the date like your last edit suggests. It's difficult to guess what to do given the limited info provided by the question...

user996018 Over a year ago

@heltonbiker I am sorry about the delay in replying. As you have suggested, the string 2009 is not hardcoded. It could be anything like rtwruyeqy:2008xc or ahdjkhjk:2005gf or djkhdkjhjk:1999gh... Its basically the author and year followed by two alphabets. Thanks for the responses.

heltonbiker · Accepted Answer · 2012-11-08 15:27:10Z

You actually just want to remove the two letters after the year in a reference. Supposing we could uniquely identify a reference as a colon followed by four numbers and two letters, than the following regular expression would work (at least it is working in this example code):

import re

s = """
according to some works (newton:2009cb), gravity is not the same that
severity (darwin:1873dc; hampton:1956tr).
"""

new_s = re.sub('(:[0-9]{4})\w{2}', r'\1', s)
print new_s

Explanation: "match a colon : followed by four numbers [0-9]{4} followed by any two "word" characters \w{2}. The parentheses catch just the part you want to keep, and r'\1' means you are replacing each whole match by a smaller part of it which is in the first (and only) group of parentheses. The r before the string is there because it is necessary to interpret \1 as a raw string, and not as an escape sequence.

Hope this helps!

Collectives™ on Stack Overflow

Editing a text file using python

2 Answers 2

5 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

5 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related