1

I have the following problem. I have a german text saved in .txt UTF-8 format, and I'd like to print it out with python. Here's my code:

txt = open(filename, 'r').read()
print txt.decode('utf-8-sig')

It works perfectly in IDLE, but when I save my code and run it from the command prompt, it raises error, specifically:

UnicodeEncodeError: 'charmap' codec can't encode characters in position 3-4: cha
racter maps to <undefined>

In my particular case, the text is "gemäßigt", and in the beginning of .py code I put something like

# -*- coding: utf-8-sig -*-

By the way, my OS is Windows, in Russian. Does anybody have an idea what is my problem?

Best, Alex

3
  • What do you get when you import sys and then sys.stdout.encoding in the console? Commented May 25, 2014 at 12:39
  • That's your problem, that codepage doesn't support German special characters. What do you get when you try the above command in IDLE? Commented May 25, 2014 at 12:50
  • In IDLE I get exactly the same word Commented May 25, 2014 at 13:24

2 Answers 2

1

Your console uses the DOS codepage 866 which doesn't have the character symbols for ä or ß, causing the error.

You could try .encoding('cp866', errors='replace') your string before output, replacing all the characters not supported by your terminal by ?s.

Sign up to request clarification or add additional context in comments.

3 Comments

Somehow, it does not work. I'll goolge later on again, and write down solution if I'll find some. Many thanks for your suggestions how to solve it!
Maybe I was unclear; you first need to decode it from UTF-8, then encode that to cp866: print txt.decode('utf-8-sig').encode('cp866', errors='replace'). Did you try it that way?
@Alekz112: if you don't need to output both latin and cyrillic characters, you can try changing the codepage (chcp 850 at the DOS prompt, then python.exe myscript.py). That at least allows you to output words like "gemäßigt".
0

Is your text in UTF-8 or utf-8-sig ? It's not exaclty the same. Here you can learn the difference. https://docs.python.org/3/library/codecs.html#encodings-and-unicode

You can also open text file already decoded with

import codecs
txt = codecs.open(filename,'r',"utf-8-sig").read()

I think Tim is correct about the console problem.

1 Comment

Yeah, I suspected console problem, I just can't figure out how to fix it

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.