2

I'm detecting whether very large (30+ GB) files are the same. Rather than hash all 30 GB, I thought I'd hash the first megabyte, then the megabyte starting at 10% into the file, then the megabyte starting at 20% into the file, and so on. Detecting whether 10 million bytes are identical is good enough for my purposes.

In Ruby or JavaScript when I'd do this, I'd just create a 10 MB buffer, read 1 MB into it, seek ahead in the file, read another 1 MB into the buffer, seek ahead, etc, then hash the buffer.

In Go, I'm a little confused about how to do this, since the Read, ReadFull, ReadAtLeast etc functions all seem to take a buffer as an argument and then read until they fill it. So I could allocate eleven separate buffers, fill 10 with separate 1 MB chunks, then concatenate them into the last one to hash... but that seems super inefficient and wasteful. I'm sure I'm missing something, but scouring the docs is only confusing me further. What's a suitable solution to this problem in Go? Can I simply ask to read n bytes into a pre-existing buffer?

2
  • If it's a Y-size file, why "create a X-size buffer" from N sections read independently? Hashing should be doable among much smaller boundaries; and those boundaries should be fetchable in a single read. (Only preserving the unhandled data from the previous read is required.) A "streaming [reading/hashing] API" would even hide those details.. Commented Jul 20, 2017 at 18:25
  • (It seems like it's a fine technical question; for a different problem.) Commented Jul 20, 2017 at 18:29

1 Answer 1

5

You can slice the []byte buffer you pass to Read, or ReadFull.

"Slicing" a slice points to the same backing array, so allocate the full buffer, and slice it in-place:

r.Read(buf[i : i+chunkSize])

or

io.ReadFull(r, buf[i:i+chunkSize])

https://play.golang.org/p/Uj626v-GE6

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.