python iterate over a file and replace strings

Question

I'm using the 're' library to replace occurrences of different strings in multiple files. The replacement pattern works fine, but I'm not able to maintain the changes to the files. I'm trying to get the same functionality that comes with the following lines:

    with open(KEY_FILE, mode='r', encoding='utf-8-sig') as f:
        replacements = csv.DictReader(f)
        user_data = open(temp_file, 'r').read()

        for col in replacements:
            user_data = user_data.replace(col[ORIGINAL_COLUMN], col[TARGET_COLUMN])

        data_output = open(f"{temp_file}", 'w')
        data_output.write(user_data)
        data_output.close()

The key line here is:

user_data = user_data.replace(col[ORIGINAL_COLUMN], col[TARGET_COLUMN])

It takes care of updating the data in place using the replace method.

I need to do the same but with the 're' library:

    with open(KEY_FILE, mode='r', encoding='utf-8-sig') as f:
        replacements = csv.DictReader(f)
        user_data = open(temp_file, 'r').read()
        a = open(f"{test_file}", 'w')

        for col in replacements:
            original_str = col[ORIGINAL_COLUMN]
            target_str = col[TARGET_COLUMN]
            compiled = re.compile(re.escape(original_str), re.IGNORECASE)
            result = compiled.sub(target_str, user_data)
            a.write(result)

I only end up with the last item in the .csv dict changed in the output file. Can't seem to get the changes made in previous iterations of the for loop to persist.

I know that it is pulling from the same file each time... which is why it is getting reset each loop, but I can't sort out a workaround.

Thanks

Just as in the first version, you need to assign the result of sub() back to the same variable. — Barmar
– Barmar, Commented Oct 26, 2021 at 17:08
And a.write() should be after the loop, just like in the first version. — Barmar
– Barmar, Commented Oct 26, 2021 at 17:09
This open(f"{test_file}", 'w') is weird as well. Just write open(test_file, 'w'). — user8563312
– user8563312, Commented Oct 26, 2021 at 17:13

jwd · Accepted Answer · 2021-10-26 18:30:51Z

Try something like this?

#!/usr/bin/env python3

import csv
import re
import sys
from io import StringIO

KEY_FILE = '''aaa,bbb
xxx,yyy
'''
TEMP_FILE = '''here is aaa some text xxx
bla bla aaaxxx
'''
ORIGINAL_COLUMN = 'FROM'
TARGET_COLUMN = 'TO'

user_data = StringIO(TEMP_FILE).read()

with StringIO(KEY_FILE) as f:
    reader = csv.DictReader(f, ['FROM','TO'])
    for row in reader:
        original_str = row[ORIGINAL_COLUMN]
        target_str = row[TARGET_COLUMN]
        compiled = re.compile(re.escape(original_str), re.IGNORECASE)
        user_data = compiled.sub(target_str, user_data)

sys.stdout.write("modified user_data:\n" + user_data)

Some things to note:

The main problem was result = sub(..., user_data) rather than result = sub(..., result). You want to keep updating the same string, rather than always applying to the original.
The compiling of regex is fairly pointless in this case, since each is just used once.
I don't have access to your test files, so I used StringIO versions inline and printing to stdout; hopefully that's easy enough to translate back to your real code (:
- In future posts, you might consider doing similar, so that your question has 100% runnable code someone else can try out without guessing.

Collectives™ on Stack Overflow

python iterate over a file and replace strings

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related