2

I just installed Anaconda to a Windows 10 machine (Python 2.7.12 |Anaconda 4.2.0 (64-bit)|) I am having an issue reading text from a file. Please see code and output below. I want the actual text from the file.

Thanks!!

Output:

 ['\xff\xfeT\x00h\x00i\x00s\x00',
  '\x00i\x00s\x00',
   '\x00a\x00',
   '\x00t\x00e\x00s\x00t\x00.\x00',
   '\x00',
   '\x00',
   '\x00',
   '\x00T\x00h\x00i\x00s\x00',
   '\x00i\x00s\x00',
   '\x00a\x00',
   '\x00t\x00e\x00s\x00t\x00']

Code:

try:    
    with open('test.txt', 'r') as f:        
        text = f.read()
except Exception as e:
    print e
    print text.split()

test.txt:

This is a test.

This is a test
2
  • Thanks. The text in the file was using encoding = "Unicode". Changed to "Ansi", and it works fine now. Commented Mar 7, 2017 at 0:21
  • If you've gotten an answer that best meets your needs, feel free to mark that answer as accepted. Commented Mar 7, 2017 at 17:29

2 Answers 2

2

I've had the best luck with using the io module to open the file with an explicit encoding.

import io
with io.open(FILE, 'r', encoding='utf-16') as f:
    job = f.read()
Sign up to request clarification or add additional context in comments.

Comments

0

You have an issue with the text encoding. You file is not encoded in UTF-8, but in UTF-16. Instead of using open, use:

import codecs
with codecs.open("test.txt", "r", encoding="utf-16") as f:
    text = f.read()

Or switch to Python3 that has a much better support for unicode.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.