Python replace string in text file with value from list

Question

My problem is to replace strings in a text file, with another string. These key strings are in a list called word_list. I've tried the following, nothing seems to work. It prints out the sentence in document.text as it appears, with no replacement:

  word_list = {'hi' : 'test', 'how' : 'teddy'} 

  with open("document.txt") as main:
      words = main.read().split()

   replaced = []
   for y in words:
         replacement = word_list.get(y, y)
         replaced.append(replacement)
   text = ' '.join(word_list.get(y, y) for y in words)


   print text

   new_main = open("done.txt", 'w')
   new_main.write(text)
   new_main.close()

Content of document.txt:

   hi you, how is he?

Current output is the same as document.txt when it should be:

   test you, teddy is he?

Any solutions/ help would be appreciated :)

why do you want to go through all of this when you can use replace method — The6thSense
– The6thSense, Commented Oct 13, 2015 at 12:57
word_list is, despite its name, a dictionary... Also, you completely ignore replaced when you create the text at the end, preferring instead to use a generator expression. — jonrsharpe
– jonrsharpe, Commented Oct 13, 2015 at 12:58
"It prints out the sentence in document.text" -> you did check the done.txt file right? — grc
– grc, Commented Oct 13, 2015 at 13:02

panda-34 · Accepted Answer · 2015-10-13 13:11:40Z

2

As you seem to want to replace words, this will use a more natural definition of 'word':

import re
word_list = {'hi' : 'test', 'how' : 'teddy'}
with open('document.txt') as main, open('done.txt', 'w') as done:
    text = main.read()
    done.write(re.sub(r'\b\w+\b', lambda x: word_list.get(x.group(), x.group()), text))

answered Oct 13, 2015 at 13:11

panda-34

4,23923 silver badges26 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

tobias_k Over a year ago

Better use r"\b(" + "|".join(word_list) + r")\b", as pattern, so you don't needlessly replace every word. Also, \w would not work if those words contain unusual characters.

panda-34 Over a year ago

@tobias_k, testing each word for N variants is O(N) while needlessly replacing it is O(1) (though likely a longer operation). Where the balance lies is a matter of profiling. And \w is by definition a word symbol.

panda-34 Over a year ago

If you have N alternatives, each has to be checked until the correct one is found, there's nothing else the regexp engine could do.

panda-34 Over a year ago

python set is no different from dict, both are O(1) average

panda-34 Over a year ago

@tobias_k, I tested it on a meagre 10000 words dictionary, and sure enough, the joined regexp sub operation runs 425 times slower

|

Christian Witts · Accepted Answer · 2015-10-13 13:02:21Z

0

word_list = {'hi' : 'test', 'how' : 'teddy'} 

with open("document.txt") as main:
    with open('done.txt', 'w') as new_main:
        input_data = main.read()
        for key, value in word_list.iteritems():
            input_data = input_data.replace(key, value)

        new_main.write(input_data)

This will read the entire contents of the file (not the most efficient if it's a large file), then iterate over your search and replace items in your dictionary, and call replace on the input text. Once complete it will write the data out to your new file.

Some things to remember with this approach

if your input file is large, it will be slow
you search pattern can also match word fragments, ie. hi will watch which, so you should cater for that too.

edited Oct 13, 2015 at 13:02

answered Oct 13, 2015 at 12:59

Christian Witts

11.7k1 gold badge36 silver badges47 bronze badges

3 Comments

Christian Witts Over a year ago

I know, which is why I was editing it to include some caveats of the naive approach.

user47467 Over a year ago

@ChristianWitts your solution work, thanks :) however,if I wanted to avoid replacing word fragments of other strings, how would this be done?

tobias_k Over a year ago

@user47467 By using a regular expression with \b for "word boundary", as in panda's answer, but I'd suggest using a different regex... see comments.

Collectives™ on Stack Overflow

Python replace string in text file with value from list

2 Answers 2

6 Comments

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

6 Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related