Python, Source-Code Encoding Problem

Question

I'm using Notepad++ editor on windows with format set to ASCII, I've read "PEP 263: Source Code Encodings" and amended my code accordingly (I think), but there are characters still printing in hex...

#!/usr/bin/python
# -*- coding: UTF-8 -*-
import os, sys

a_munge = [ "A", "4", "/\\", "\@", "/-\\", "^", "aye", "?" ]
b_munge = [ "B", "8", "13", "I3", "|3" , "P>", "|:", "!3", "(3", "/3", "3","]3" ]
c_munge = [ "C", "<", "(", "{", "(c)" ]
d_munge = [ "D", "|)", "|o", "?", "])", "[)", "I>", "|>", " ?", "T)", "0", "cl" ]
e_munge = [ "E", "3", "&", "€", "£", "[-", "|=-", "?" ]
         .
         .
         .

so, did you try actually changing Notepad++'s file format to UTF-8? — SilentGhost
– SilentGhost, Commented Jan 23, 2010 at 13:44

Ignacio Vazquez-Abrams · Accepted Answer · 2010-01-23 13:38:25Z

2

Perhaps you should be using unicode literals (e.g. u'€') instead.

answered Jan 23, 2010 at 13:38

Ignacio Vazquez-Abrams

804k160 gold badges1.4k silver badges1.4k bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

volting Over a year ago

Well I just tryed that and I got File "C:\Users\admin\Desktop\python\passgen.py", line 9 e_replace_list = [ "E", "3", "&", u"Ç", "ú", "[-", "|=-", "?" ] SyntaxError: (unicode error) 'utf8' codec can't decode byte 0x80 in positio unexpected code byte

Ignacio Vazquez-Abrams Over a year ago

1) Your file isn't UTF-8. 2) They should all be unicode literals. farmdev.com/talks/unicode

volting Over a year ago

...informative presentation thanks, although I'm not sure Im much wiser... To clarify what you said, 'They should all be unicode literals' when u say 'all' do you mean all characters not included in the ASCII set? Ive done this any it runs, but non ASCII characters are still printed in unicode hex eg. € = u'\u20ac'

Ignacio Vazquez-Abrams Over a year ago

Then you should consider showing the code that actually does the work.

volting Over a year ago

<code> print e_munge </code> This is the way Im doing it at the moment just for debugging purposes but eventually the characters will printed to a Tkinter GUI

|

Mark Tolonen · Accepted Answer · 2010-01-23 15:42:00Z

2

The line:

# -*- coding: UTF-8 -*-

declares that the source file is saved in UTF-8. Anything else is an error.

When you declare byte strings in your source code:

e_munge = [ "E", "3", "&", "€", "£", "[-", "|=-", "?" ]

then byte strings like "€" will actually contain the encoded bytes used to save the source file.

When you use Unicode strings instead:

    e_munge = [ u"E", u"3", u"&", u"€", u"£", u"[-", u"|=-", u"?" ]

then when u followed by the byte-string "€" is read by Python from a source file, it uses the declared encoding to decode that character into Unicode.

An illustration:

# coding: utf-8
bs = '€'
us = u'€'
print repr(bs)
print repr(us)

OUTPUT:

'\xe2\x82\xac'
u'\u20ac'

answered Jan 23, 2010 at 15:42

Mark Tolonen

181k26 gold badges182 silver badges278 bronze badges

1 Comment

volting Over a year ago

ok I already deduced that, but how do I get it to print out the character € and not the unicode code...

John Machin · Accepted Answer · 2010-01-24 00:40:53Z

1

print some_list is in effect print repr(some_list) -- that's why you see \u20ac instead of a Euro character. For debugging purposes, the "unicode hex" is exactly what you need for unambiguous display of your data.

You appear to have perfectly OK unicode objects in your list; I suggest that you don't "print" the list to Tkinter.

answered Jan 24, 2010 at 0:40

John Machin

83.2k12 gold badges147 silver badges193 bronze badges

3 Comments

volting Over a year ago

Well I won't be printing all the lists to Tkinter(atleast not at one time). The program will be a simple password generator which will allow a user to input a word that they would like to use for a password, the program will then do a pseudo-random munge of the word and output the result to a tkinter text box so that the user can copy and past to wherever... .Why do suggest that I dont output to Tkinter?

John Machin Over a year ago

You said that "the characters will printed to a Tkinter GUI". I'm merely suggesting that you don't use the Python print statement to send the data to Tkinter for display.

volting Over a year ago

Ok fair enough, I guess my previous comment was a little ambiguous, thanks for your input.

Collectives™ on Stack Overflow

Python, Source-Code Encoding Problem

3 Answers 3

7 Comments

1 Comment

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

7 Comments

1 Comment

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related