0

I have two binary input files, firstfile and secondfile. secondfile is firstfile + additional material. I want to isolate this additional material in a separate file, newfile. This is what I have so far:

import os
import struct

origbytes = os.path.getsize(firstfile)
fullbytes = os.path.getsize(secondfile)
numbytes = fullbytes-origbytes

with open(secondfile,'rb') as f:
    first = f.read(origbytes)
    rest = f.read()

Naturally, my inclination is to do (which seems to work):

with open(newfile,'wb') as f:
    f.write(rest)

I can't find it but thought I read on SO that I should pack this first using struct.pack before writing to file. The following gives me an error:

with open(newfile,'wb') as f:
    f.write(struct.pack('%%%ds' % numbytes,rest))

-----> error: bad char in struct format

This works however:

with open(newfile,'wb') as f:
    f.write(struct.pack('c'*numbytes,*rest))

And for the ones that work, this gives me the right answer

with open(newfile,'rb') as f:
    test = f.read()

len(test)==numbytes

-----> True

Is this the correct way to write a binary file? I just want to make sure I'm doing this part correctly to diagnose if the second part of the file is corrupted as another reader program I am feeding newfile to is telling me, or I am doing this wrong. Thank you.

4 Answers 4

3

If you know that secondfile is the same as firstfile + appended data, why even read in the first part of secondfile?

with open(secondfile,'rb') as f:
    f.seek(origbytes)
    rest = f.read()

As for writing things out,

with open(newfile,'wb') as f:
    f.write(rest)

is just fine. The stuff with struct would just be a no-op anyway. The only thing you might consider is the size of rest. If it could be large, you may want to read and write the data in blocks.

Sign up to request clarification or add additional context in comments.

Comments

2

There is no reason to use the struct module, which is for converting between binary formats and Python objects. There's no conversion needed here.

Strings in Python 2.x are just an array of bytes and can be read and written to and from files. (In Python 3.x, the read function returns a bytes object, which is the same thing, if you open the file with open(filename, 'rb').)

So you can just read the file into a string, then write it again:

import os

origbytes = os.path.getsize(firstfile)
fullbytes = os.path.getsize(secondfile)
numbytes = fullbytes-origbytes

with open(secondfile,'rb') as f:
    first = f.seek(origbytes)
    rest = f.read()

with open(newfile,'wb') as f:
    f.write(rest)

1 Comment

Thanks for the clarification on the use of struct.
1
  1. You don't need to read origbytes, just move file pointer to the right position: f.seek(numbytes)
  2. You don't need struct packing, write rest to the newfile.

Comments

0

This is not c, there is no % in the format string. What you want is:

f.write(struct.pack('%ds' % numbytes,rest))

It worked for me:

>>> struct.pack('%ds' % 5,'abcde')
'abcde'

Explanation: '%%%ds' % 15 is '%15s', while what you want is '%ds' % 15 which is '15s'

2 Comments

Ah, right, thank you -- but do you know if this is otherwise the correct way to segment a binary file?
@crippledlambda No idea, but it sounds like it will work, as long as the second file is just the first with more data on the end.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.