0

I'm using Notepad++ editor on windows with format set to ASCII, I've read "PEP 263: Source Code Encodings" and amended my code accordingly (I think), but there are characters still printing in hex...

#!/usr/bin/python
# -*- coding: UTF-8 -*-
import os, sys

a_munge = [ "A", "4", "/\\", "\@", "/-\\", "^", "aye", "?" ]
b_munge = [ "B", "8", "13", "I3", "|3" , "P>", "|:", "!3", "(3", "/3", "3","]3" ]
c_munge = [ "C", "<", "(", "{", "(c)" ]
d_munge = [ "D", "|)", "|o", "?", "])", "[)", "I>", "|>", " ?", "T)", "0", "cl" ]
e_munge = [ "E", "3", "&", "€", "£", "[-", "|=-", "?" ]
         .
         .
         .
1
  • so, did you try actually changing Notepad++'s file format to UTF-8? Commented Jan 23, 2010 at 13:44

3 Answers 3

2

Perhaps you should be using unicode literals (e.g. u'€') instead.

Sign up to request clarification or add additional context in comments.

7 Comments

Well I just tryed that and I got File "C:\Users\admin\Desktop\python\passgen.py", line 9 e_replace_list = [ "E", "3", "&", u"Ç", "ú", "[-", "|=-", "?" ] SyntaxError: (unicode error) 'utf8' codec can't decode byte 0x80 in positio unexpected code byte
1) Your file isn't UTF-8. 2) They should all be unicode literals. farmdev.com/talks/unicode
...informative presentation thanks, although I'm not sure Im much wiser... To clarify what you said, 'They should all be unicode literals' when u say 'all' do you mean all characters not included in the ASCII set? Ive done this any it runs, but non ASCII characters are still printed in unicode hex eg. € = u'\u20ac'
Then you should consider showing the code that actually does the work.
<code> print e_munge </code> This is the way Im doing it at the moment just for debugging purposes but eventually the characters will printed to a Tkinter GUI
|
2

The line:

# -*- coding: UTF-8 -*-

declares that the source file is saved in UTF-8. Anything else is an error.

When you declare byte strings in your source code:

e_munge = [ "E", "3", "&", "€", "£", "[-", "|=-", "?" ]

then byte strings like "€" will actually contain the encoded bytes used to save the source file.

When you use Unicode strings instead:

    e_munge = [ u"E", u"3", u"&", u"€", u"£", u"[-", u"|=-", u"?" ]

then when u followed by the byte-string "€" is read by Python from a source file, it uses the declared encoding to decode that character into Unicode.

An illustration:

# coding: utf-8
bs = '€'
us = u'€'
print repr(bs)
print repr(us)

OUTPUT:

'\xe2\x82\xac'
u'\u20ac'

1 Comment

ok I already deduced that, but how do I get it to print out the character € and not the unicode code...
1

print some_list is in effect print repr(some_list) -- that's why you see \u20ac instead of a Euro character. For debugging purposes, the "unicode hex" is exactly what you need for unambiguous display of your data.

You appear to have perfectly OK unicode objects in your list; I suggest that you don't "print" the list to Tkinter.

3 Comments

Well I won't be printing all the lists to Tkinter(atleast not at one time). The program will be a simple password generator which will allow a user to input a word that they would like to use for a password, the program will then do a pseudo-random munge of the word and output the result to a tkinter text box so that the user can copy and past to wherever... .Why do suggest that I dont output to Tkinter?
You said that "the characters will printed to a Tkinter GUI". I'm merely suggesting that you don't use the Python print statement to send the data to Tkinter for display.
Ok fair enough, I guess my previous comment was a little ambiguous, thanks for your input.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.