0

My problem is to replace strings in a text file, with another string. These key strings are in a list called word_list. I've tried the following, nothing seems to work. It prints out the sentence in document.text as it appears, with no replacement:

  word_list = {'hi' : 'test', 'how' : 'teddy'} 

  with open("document.txt") as main:
      words = main.read().split()

   replaced = []
   for y in words:
         replacement = word_list.get(y, y)
         replaced.append(replacement)
   text = ' '.join(word_list.get(y, y) for y in words)


   print text

   new_main = open("done.txt", 'w')
   new_main.write(text)
   new_main.close()

Content of document.txt:

   hi you, how is he?

Current output is the same as document.txt when it should be:

   test you, teddy is he?

Any solutions/ help would be appreciated :)

7
  • 1
    why do you want to go through all of this when you can use replace method Commented Oct 13, 2015 at 12:57
  • word_list is, despite its name, a dictionary... Also, you completely ignore replaced when you create the text at the end, preferring instead to use a generator expression. Commented Oct 13, 2015 at 12:58
  • and your code works for me ? Commented Oct 13, 2015 at 13:02
  • "It prints out the sentence in document.text" -> you did check the done.txt file right? Commented Oct 13, 2015 at 13:02
  • 1
    Your code works fine. what is your problem? Commented Oct 13, 2015 at 13:05

2 Answers 2

2

As you seem to want to replace words, this will use a more natural definition of 'word':

import re
word_list = {'hi' : 'test', 'how' : 'teddy'}
with open('document.txt') as main, open('done.txt', 'w') as done:
    text = main.read()
    done.write(re.sub(r'\b\w+\b', lambda x: word_list.get(x.group(), x.group()), text))
Sign up to request clarification or add additional context in comments.

6 Comments

Better use r"\b(" + "|".join(word_list) + r")\b", as pattern, so you don't needlessly replace every word. Also, \w would not work if those words contain unusual characters.
@tobias_k, testing each word for N variants is O(N) while needlessly replacing it is O(1) (though likely a longer operation). Where the balance lies is a matter of profiling. And \w is by definition a word symbol.
If you have N alternatives, each has to be checked until the correct one is found, there's nothing else the regexp engine could do.
python set is no different from dict, both are O(1) average
@tobias_k, I tested it on a meagre 10000 words dictionary, and sure enough, the joined regexp sub operation runs 425 times slower
|
0
word_list = {'hi' : 'test', 'how' : 'teddy'} 

with open("document.txt") as main:
    with open('done.txt', 'w') as new_main:
        input_data = main.read()
        for key, value in word_list.iteritems():
            input_data = input_data.replace(key, value)

        new_main.write(input_data)

This will read the entire contents of the file (not the most efficient if it's a large file), then iterate over your search and replace items in your dictionary, and call replace on the input text. Once complete it will write the data out to your new file.

Some things to remember with this approach

  • if your input file is large, it will be slow
  • you search pattern can also match word fragments, ie. hi will watch which, so you should cater for that too.

3 Comments

I know, which is why I was editing it to include some caveats of the naive approach.
@ChristianWitts your solution work, thanks :) however,if I wanted to avoid replacing word fragments of other strings, how would this be done?
@user47467 By using a regular expression with \b for "word boundary", as in panda's answer, but I'd suggest using a different regex... see comments.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.