5

It's been a long day and I'm a bit stumped.

I'm reading a binary file that contains lots of wide-char strings and I want to dump these out as Python unicode strings. (To unpack the non-string data I'm using the struct module, but I don't how to do the same with the strings.)

For example, reading the word "Series":

myfile = open("test.lei", "rb")
myfile.seek(44)
data = myfile.read(12)

# data is now 'S\x00e\x00r\x00i\x00e\x00s\x00'

How can I encode that raw wide-char data as a Python string?

Edit: I'm using Python 2.6

2
  • file isn't supposed to be used to open files; open is. codecs.open is great if this is really a text file but one in a somewhat weird encoding. Commented Apr 30, 2010 at 17:43
  • Mike G - quite right, I've corrected the example. Actually I normally use 'open', but something was screwy with my ipython shell today and it gave me an obscure error. I'd probably overwritten it with something else. Commented Apr 30, 2010 at 23:34

4 Answers 4

8
>>> data = 'S\x00e\x00r\x00i\x00e\x00s\x00'
>>> data.decode('utf-16')
u'Series'
Sign up to request clarification or add additional context in comments.

Comments

3

I also recommend to use rstrip with '\x00' after decode - to remove all '\x00' trailing characters, unless, of course, they are not needed.

>>> data = 'S\x00o\x00m\x00e\x00\x20\x00D\x00a\x00t\x00a\x00\x00\x00\x00\x00'
>>> print '"%s"' % data.decode('utf-16').rstrip('\x00')
>>> "Some Data"

Without rstrip('\x00') the result will be with trailing spaces:

>>> "Some Data  "

Comments

2

If the string in question is known not to have any characters beyond FF, another possibility that generates a string rather than a unicode object, by eliding the zero-bytes:

>>> 'S\x00e\x00r\x00i\x00e\x00s\x00'[::2]
'Series'

Comments

0

Hmm, why do you say "open" is preferrable to "file"? I see in the reference (python 2.5):

3.9 File Objects File objects are implemented using C's stdio package and can be created with the built-in constructor file() described in section 2.1, ``Built-in Functions.''3.6 ----- Footnote (3.6) file() is new in Python 2.2. The older built-in open() is an alias for file().

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.