4

I wrote a python script to create a binary file of integers.

import struct  
pos = [7623, 3015, 3231, 3829]  
inh = open('test.bin', 'wb')  
for e in pos:  
    inh.write(struct.pack('i', e))  
inh.close()

It worked well, then I tried to read the 'test.bin' file using the below code.

import struct  
inh = open('test.bin', 'rb')  
for rec in inh:  
    pos = struct.unpack('i', rec)  
    print pos  
inh.close()

But it failed with an error message:

Traceback (most recent call last):   
   File "readbinary.py", line 10, in <module>  
   pos = struct.unpack('i', rec)  
   File "/usr/lib/python2.5/struct.py", line 87, in unpack  
   return o.unpack(s)  
struct.error: unpack requires a string argument of length 4

How can I read this file using struct.unpack?

6 Answers 6

8

for rec in inh: reads one line at a time -- not what you want for a binary file. Read 4 bytes at a time (with a while loop and inh.read(4)) instead (or read everything into memory with a single .read() call, then unpack successive 4-byte slices). The second approach is simplest and most practical as long as the amount of data involved isn't huge:

import struct
with open('test.bin', 'rb') as inh:
    indata = inh.read()
for i in range(0, len(data), 4):
    pos = struct.unpack('i', data[i:i+4])  
    print(pos)  

If you do fear potentially huge amounts of data (which would take more memory than you have available), a simple generator offers an elegant alternative:

import struct
def by4(f):
    rec = 'x'  # placeholder for the `while`
    while rec:
        rec = f.read(4)
        if rec: yield rec           
with open('test.bin', 'rb') as inh:
    for rec in by4(inh):
        pos = struct.unpack('i', rec)  
        print(pos)  

A key advantage to this second approach is that the by4 generator can easily be tweaked (while maintaining the specs: return a binary file's data 4 bytes at a time) to use a different implementation strategy for buffering, all the way to the first approach (read everything then parcel it out) which can be seen as "infinite buffering" and coded:

def by4(f):
    data = inf.read()
    for i in range(0, len(data), 4):
        yield data[i:i+4]

while leaving the "application logic" (what to do with that stream of 4-byte chunks) intact and independent of the I/O layer (which gets encapsulated within the generator).

Sign up to request clarification or add additional context in comments.

2 Comments

The first version of by4 will hang once "rec = f.read(4)" returns zero. There is no exit from the while loop.
An alternative is the import functools; reader= functools.partial(f.read, 4); for rec in iter(reader, ''): construct. Otherwise, to avoid functools and its speedup: for rec in iter(lambda: f.read(4), '') )
5

I think "for rec in inh" is supposed to read 'lines', not bytes. What you want is:

while True:
    rec = inh.read(4) # Or inh.read(struct.calcsize('i'))
    if len(rec) != 4:
        break
    (pos,) = struct.unpack('i', rec)
    print pos

Or as others have mentioned:

while True:
    try:
        (pos,) = struct.unpack_from('i', inh)
    except (some_exception...):
        break

Comments

1

Check the size of the packed integers:

>>> pos
[7623, 3015, 3231, 3829]
>>> [struct.pack('i',e) for e in pos]
['\xc7\x1d\x00\x00', '\xc7\x0b\x00\x00', '\x9f\x0c\x00\x00', '\xf5\x0e\x00\x00']

We see 4-byte strings, it means that reading should be 4 bytes at a time:

>>> inh=open('test.bin','rb')
>>> b1=inh.read(4)
>>> b1
'\xc7\x1d\x00\x00'
>>> struct.unpack('i',b1)
(7623,)
>>> 

This is the original int! Extending into a reading loop is left as an exercise .

Comments

1

You can probably use array as well if you want:

import array  
pos = array.array('i', [7623, 3015, 3231, 3829]) 
inh = open('test.bin', 'wb')  
pos.write(inh)
inh.close()

Then use array.array.fromfile or fromstring to read it back.

Comments

1

This function reads all bytes from file

def read_binary_file(filename):
try:
    f = open(filename, 'rb')
    n = os.path.getsize(filename)
    data = array.array('B')
    data.read(f, n)
    f.close()
    fsize = data.__len__()
    return (fsize, data)

except IOError:
    return (-1, [])

# somewhere in your code
t = read_binary_file(FILENAME)
fsize = t[0]

if (fsize > 0):
    data = t[1]
    # work with data
else:
    print 'Error reading file'

Comments

0

Your iterator isn't reading 4 bytes at a time so I imagine it's rather confused. Like SilentGhost mentioned, it'd probably be best to use unpack_from().

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.