10

I'm trying to figure out the best way to read a packed binary file in Go that was produced by Python like the following:

import struct
f = open('tst.bin', 'wb')
fmt = 'iih' #please note this is packed binary: 4byte int, 4byte int, 2byte int
f.write(struct.pack(fmt,4, 185765, 1020))
f.write(struct.pack(fmt,4, 185765, 1022))
f.close()

I have been tinkering with some of the examples I've seen on Github.com and a few other sources but I can't seem to get anything working correctly (update shows working method). What is the idiomatic way to do this sort of thing in Go? This is one of several attempts

UPDATE and WORKING

package main

    import (
            "fmt"
            "os"
            "encoding/binary"
            "io"
            )

    func main() {
            fp, err := os.Open("tst.bin")

            if err != nil {
                    panic(err)
            }

            defer fp.Close()

            lineBuf := make([]byte, 10) //4 byte int, 4 byte int, 2 byte int per line

            for true {
                _, err := fp.Read(lineBuf)

                if err == io.EOF{
                    break
                }

                aVal := int32(binary.LittleEndian.Uint32(lineBuf[0:4])) // same as: int32(uint32(b[0]) | uint32(b[1])<<8 | uint32(b[2])<<16 | uint32(b[3])<<24)
                bVal := int32(binary.LittleEndian.Uint32(lineBuf[4:8]))
                cVal := int16(binary.LittleEndian.Uint16(lineBuf[8:10])) //same as: int16(uint32(b[0]) | uint32(b[1])<<8)
                fmt.Println(aVal, bVal, cVal)
            }
    }
4
  • 1
    Not being a Python dev there's only so much I can tell you ... but having had a quick look at the documentation for the struct.pack method, your fmt of iih means "32 bit Integer, 32 bit Integer, 16 bit Short". Your struct in Go has three 32 bit integers ... not two 32 bit integers and a 16 bit short. There's also a few mentions of padding/alignment in the Python documentation so you need to take that in to consideration potentially. Commented Dec 4, 2015 at 1:37
  • 1
    Thank you Simon - that got me looking a bit closer at the data types and sizes. Python i is 4 bytes and h is 2 bytes - I've updated my code and am able to read the data and get the correct values now. Now I need to figure out how to loop through the file. Commented Dec 4, 2015 at 3:44
  • You might want to have a look into Protocol Buffers which were created for exactly your use case. Works like a charm for me, though I use them for Golang to Java. Commented Dec 4, 2015 at 10:55
  • Thank you for the tip Markus. I will check that out. I figured out A solution but would certainly like to use the most appropriate solution. I also want to write the data too (same 4 byte and 2 byte ints) so I'll look into the Protocol Buffers for that too. Commented Dec 4, 2015 at 11:18

4 Answers 4

5

A well portable and rather easy way to handle the problem are Google's "Protocol Buffers". Though this is too late now since you got it working, I took some effort in explaining and coding it, so I am posting it anyway.

You can find the code on https://github.com/mwmahlberg/ProtoBufDemo

You need to install the protocol buffers for python using your preferred method (pip, OS package management, source) and for Go

The .proto file

The .proto file is rather simple for our example. I called it data.proto

syntax = "proto2";
package main;

message Demo {
  required uint32  A = 1;
  required uint32 B = 2;

  // A shortcomning: no 16 bit ints
  // We need to make this sure in the applications
  required uint32 C = 3;
}

Now you need to call protoc on the file and have it provide the code for both Python and Go:

protoc --go_out=. --python_out=. data.proto

which generates the files data_pb2.py and data.pb.go. Those files provide the language specific access to the protocol buffer data.

When using the code from github, all you need to do is to issue

go generate

in the source directory.

The Python code

import data_pb2

def main():

    # We create an instance of the message type "Demo"...
    data = data_pb2.Demo()

    # ...and fill it with data
    data.A = long(5)
    data.B = long(5)
    data.C = long(2015)


    print "* Python writing to file"
    f = open('tst.bin', 'wb')

    # Note that "data.SerializeToString()" counterintuitively
    # writes binary data
    f.write(data.SerializeToString())
    f.close()

    f = open('tst.bin', 'rb')
    read = data_pb2.Demo()
    read.ParseFromString(f.read())
    f.close()

    print "* Python reading from file"
    print "\tDemo.A: %d, Demo.B: %d, Demo.C: %d" %(read.A, read.B, read.C)

if __name__ == '__main__':
    main()

We import the file generated by protoc and use it. Not much magic here.

The Go File

package main

//go:generate protoc --python_out=. data.proto
//go:generate protoc --go_out=. data.proto
import (
    "fmt"
    "os"

    "github.com/golang/protobuf/proto"
)

func main() {

    // Note that we do not handle any errors for the sake of brevity
    d := Demo{}
    f, _ := os.Open("tst.bin")
    fi, _ := f.Stat()

    // We create a buffer which is big enough to hold the entire message
    b := make([]byte,fi.Size())

    f.Read(b)

    proto.Unmarshal(b, &d)
    fmt.Println("* Go reading from file")

    // Note the explicit pointer dereference, as the fields are pointers to a pointers
    fmt.Printf("\tDemo.A: %d, Demo.B: %d, Demo.C: %d\n",*d.A,*d.B,*d.C)
}

Note that we do not need to explicitly import, as the package of data.proto is main.

The result

After generation the required files and compiling the source, when you issue

$ python writer.py && ./ProtoBufDemo

the result is

* Python writing to file
* Python reading from file
    Demo.A: 5, Demo.B: 5, Demo.C: 2015
* Go reading from file
    Demo.A: 5, Demo.B: 5, Demo.C: 2015

Note that the Makefile in the repository offers a shorcut for generating the code, compiling the .go files and run both programs:

make run
Sign up to request clarification or add additional context in comments.

2 Comments

Markus, thank you so much for the example! I'll be stepping through and comparing. I will be reading some large binary files at times up to ~ 80GB so this is helpful. As you know there is a huge difference between working and working well with enough performance!
@ChrisTownsend Feel free to use the code from the repo: It is licensed under the polite version of the WTFPL.
4

The Python format string is iih, meaning two 32-bit signed integers and one 16-bit signed integer (see the docs). You can simply use your first example but change the struct to:

type binData struct {
    A int32
    B int32
    C int16
}

func main() {
        fp, err := os.Open("tst.bin")

        if err != nil {
                panic(err)
        }

        defer fp.Close()

        for {
            thing := binData{}
            err := binary.Read(fp, binary.LittleEndian, &thing)

            if err == io.EOF{
                break
            }

            fmt.Println(thing.A, thing.B, thing.C)
        }
}

Note that the Python packing didn't specify the endianness explicitly, but if you're sure the system that ran it generated little-endian binary, this should work.

Edit: Added main() function to explain what I mean.

Edit 2: Capitalized struct fields so binary.Read could write into them.

7 Comments

The Python format string iih means two 4 byte ints followed by a 2 byte int per the documentation on packed binary via the Python struct: docs.python.org/2/library/struct.html I figured out A solution, posted in an update, and am now looking for alternative solutions that would be more appropriate.
If all your binary data is structured like this, I recommend the struct approach, as you can create a slice of binData structs and access each one's a,b,c attributes easily with the correct type.
mjois, I'm not seeing how to go from the 4 and 2 byte integer arrays to the struct. In my current working example I can go from those 4 and 2 byte arrays to the int32 and int16 but I don't see how to take it directly from those byte arrays to the struct. Now, certainly I could assign what I have to the struct and then create a struct that contains the binData struct but that would seem to have a lot of overhead. Any thoughts, examples of how you'd go about it?
You don't need the integer arrays or byte arrays at all, you can just call binary.Read with the struct address as the third argument (you had it as your first attempt originally). Even better, you can create a slice []binData, loop through the input and read a struct every iteration, then append to the slice. Even further better, if you know in advance you are going to read N structs, you can do slice := make([]binData, N); binary.Read(fp, binary.LittleEndian, slice).
I just tried...and it worked. I did have to change a,b,c to A,B,C to get them exported though, so binary.Read could fill them in. And I promise you this is the most Go-elegant way to do this.
|
1

As I mentioned in my post, I'm not sure this is THE idiomatic way to do this in Go but this is the solution that I came up with after a fair bit of tinkering and adapting several different examples. Note again that this unpacks 4 and 2 byte int into Go int32 and int16 respectively. Posting so that there is a valid answer in case someone comes looking. Hopefully someone will post a more idiomatic way of accomplishing this but for now, this works.

package main

    import (
            "fmt"
            "os"
            "encoding/binary"
            "io"
            )

    func main() {
            fp, err := os.Open("tst.bin")

            if err != nil {
                    panic(err)
            }

            defer fp.Close()

            lineBuf := make([]byte, 10) //4 byte int, 4 byte int, 2 byte int per line

            for true {
                _, err := fp.Read(lineBuf)

                if err == io.EOF{
                    break
                }

                aVal := int32(binary.LittleEndian.Uint32(lineBuf[0:4])) // same as: int32(uint32(b[0]) | uint32(b[1])<<8 | uint32(b[2])<<16 | uint32(b[3])<<24)
                bVal := int32(binary.LittleEndian.Uint32(lineBuf[4:8]))
                cVal := int16(binary.LittleEndian.Uint16(lineBuf[8:10])) //same as: int16(uint32(b[0]) | uint32(b[1])<<8)
                fmt.Println(aVal, bVal, cVal)
            }
    }

Comments

0

Try binpacker libary.

Example:

Example data:

buffer := new(bytes.Buffer)
packer := binpacker.NewPacker(buffer)
unpacker := binpacker.NewUnpacker(buffer)
packer.PushByte(0x01)
packer.PushUint16(math.MaxUint16)

Unpack:

var val1 byte
var val2 uint16
var err error
val1, err = unpacker.ShiftByte()
val2, err = unpacker.ShiftUint16()

Or:

var val1 byte
var val2 uint16
var err error
unpacker.FetchByte(&val1).FetchUint16(&val2)
unpacker.Error() // Make sure error is nil

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.