0

I'm using the 're' library to replace occurrences of different strings in multiple files. The replacement pattern works fine, but I'm not able to maintain the changes to the files. I'm trying to get the same functionality that comes with the following lines:

    with open(KEY_FILE, mode='r', encoding='utf-8-sig') as f:
        replacements = csv.DictReader(f)
        user_data = open(temp_file, 'r').read()

        for col in replacements:
            user_data = user_data.replace(col[ORIGINAL_COLUMN], col[TARGET_COLUMN])

        data_output = open(f"{temp_file}", 'w')
        data_output.write(user_data)
        data_output.close()

The key line here is:

user_data = user_data.replace(col[ORIGINAL_COLUMN], col[TARGET_COLUMN])

It takes care of updating the data in place using the replace method.

I need to do the same but with the 're' library:

    with open(KEY_FILE, mode='r', encoding='utf-8-sig') as f:
        replacements = csv.DictReader(f)
        user_data = open(temp_file, 'r').read()
        a = open(f"{test_file}", 'w')

        for col in replacements:
            original_str = col[ORIGINAL_COLUMN]
            target_str = col[TARGET_COLUMN]
            compiled = re.compile(re.escape(original_str), re.IGNORECASE)
            result = compiled.sub(target_str, user_data)
            a.write(result)

I only end up with the last item in the .csv dict changed in the output file. Can't seem to get the changes made in previous iterations of the for loop to persist.

I know that it is pulling from the same file each time... which is why it is getting reset each loop, but I can't sort out a workaround.

Thanks

11
  • 1
    Just as in the first version, you need to assign the result of sub() back to the same variable. Commented Oct 26, 2021 at 17:08
  • user_data = compiled.sub(...) Commented Oct 26, 2021 at 17:08
  • And a.write() should be after the loop, just like in the first version. Commented Oct 26, 2021 at 17:09
  • Why aren't you using with for the output files? Commented Oct 26, 2021 at 17:09
  • 1
    This open(f"{test_file}", 'w') is weird as well. Just write open(test_file, 'w'). Commented Oct 26, 2021 at 17:13

1 Answer 1

1

Try something like this?

#!/usr/bin/env python3

import csv
import re
import sys
from io import StringIO

KEY_FILE = '''aaa,bbb
xxx,yyy
'''
TEMP_FILE = '''here is aaa some text xxx
bla bla aaaxxx
'''
ORIGINAL_COLUMN = 'FROM'
TARGET_COLUMN = 'TO'

user_data = StringIO(TEMP_FILE).read()

with StringIO(KEY_FILE) as f:
    reader = csv.DictReader(f, ['FROM','TO'])
    for row in reader:
        original_str = row[ORIGINAL_COLUMN]
        target_str = row[TARGET_COLUMN]
        compiled = re.compile(re.escape(original_str), re.IGNORECASE)
        user_data = compiled.sub(target_str, user_data)

sys.stdout.write("modified user_data:\n" + user_data)

Some things to note:

  • The main problem was result = sub(..., user_data) rather than result = sub(..., result). You want to keep updating the same string, rather than always applying to the original.
  • The compiling of regex is fairly pointless in this case, since each is just used once.
  • I don't have access to your test files, so I used StringIO versions inline and printing to stdout; hopefully that's easy enough to translate back to your real code (:
    • In future posts, you might consider doing similar, so that your question has 100% runnable code someone else can try out without guessing.
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.