9

What is the best hash method for an array of byte?

The arrays are serialized class objects containing jpeg image passed between applications over TCP/IP.

The array size is about 200k.

0

4 Answers 4

12

Any of the built-in hashing functions should do; depending on how much you care about collisions these are your options (from most collisions to least):

  • MD5
  • SHA1
  • SHA256
  • SHA384
  • SHA512

They are as simple to use as:

var hash = SHA1.Create().ComputeHash(data);

Bonus Marks: If you don't care about security (which I don't think you do given that you are getting the hashes for images) you might want to look into Murmur hash, which is designed for content hashing and not secure hashing (and is thus much faster). It isn't, however, in the framework so you will have to find an implementation (and you should probably go for Murmur3).

Edit: If you are looking for a HASHCODE for a byte[] array it's entirely up to you, it usually consists of bit shifting (by primes) and XORing. E.g.

public class ByteArrayEqualityComparer : IEqualityComparer<byte[]>
{
    public static readonly ByteArrayEqualityComparer Default = new ByteArrayEqualityComparer();
    private ByteArrayEqualityComparer() { }

    public bool Equals(byte[] x, byte[] y)
    {
        if (x == null && y == null)
            return true;
        if (x == null || y == null)
            return false;
        if (x.Length != y.Length)
            return false;
        for (var i = 0; i < x.Length; i++)
            if (x[i] != y[i])
                return false;
        return true;
    }

    public int GetHashCode(byte[] obj)
    {
        if (obj == null || obj.Length == 0)
            return 0;
        var hashCode = 0;
        for (var i = 0; i < obj.Length; i++)
            // Rotate by 3 bits and XOR the new value.
            hashCode = (hashCode << 3) | (hashCode >> (29)) ^ obj[i];
        return hashCode;
    }
}
// ...
var hc = ByteArrayEqualityComparer.Default.GetHashCode(data);

EDIT: If you want to validate that the value hasn't changed you should use CRC32.

Sign up to request clarification or add additional context in comments.

4 Comments

thank you for the answer, I need quick byte[] array content comparison only, there is no need for encripted hashes. I need to make sure sent data remains the same as it is recieved on the other end
@Chesnokov so why didn't you ask for that in the first place?
I meant comparison by hash value, as in the question, data is sent over internet along with a hash. On the other end hash is recomputed and compared to make sure there were no modifications on the data during transfer
@Chesnokov - I added a link for CRC32, which is what you should be using.
5

Jon Skeet has a good answer on how to override GetHashCode, which is based on general effective hash techniques where you start with a prime number, add it to the hash codes of the components multiplied by another prime number, allowing for overflow.

For your case, you would do:

static int GetByteArrayHashCode(byte[] array)
{
    unchecked
    {
        int hash = 17;

        // Cycle through each element in the array.
        foreach (var value in array)
        {
            // Update the hash.
            hash = hash * 23 + value.GetHashCode();            
        }

        return hash;
    }
}

Note in Jon's answer he goes into why this is better than XORing the hashes of the individual elements (and that anonymous types in C# currently do not XOR the hashes of the individual elements, but use something similar to the above).

While this will be faster than the hash algorithms from the System.Security.Cryptography namespace (because you are dealing with smaller hashes), the downside is that you might have more collisions.

You would have to test against your data and determine how often you are getting collisions vs. the work that needs to be done in the case of a collision.

2 Comments

Is foreach slower than for? Also, no need to call GetHashCode on byte as it just returns its value cast to int.
@DrewNoakes Pretty sure that the compiler changes foreach on arrays to for. That's an implementation detail though, and generally you should test if you see this is a bottleneck. Also, same with the return value of GetHashCode for byte.
4

Based on the Compiler Generated GetHashCode()

public static int GetHashCode(byte[] array) {
    unchecked {
        int i = 0;
        int hash = 17;
        int rounded = array.Length & ~3;

        hash = 31 * hash + array.Length;

        for (; i < rounded; i += 4) {
            hash = 31 * hash + BitConverter.ToInt32(array, i);
        }

        if (i < array.Length) {
            int val = array[i];
            i++;

            if (i < array.Length) {
                val |= array[i] << 8;
                i++;

                if (i < array.Length) {
                    val |= array[i] << 16;
                }
            }

            hash = 31 * hash + val;
        }

        return hash;
    }
}

Ah... and a link to C# Murmurhash http://landman-code.blogspot.com/2009/02/c-superfasthash-and-murmurhash2.html

2 Comments

Nice answer, but that is Murmur2 which has problems with repeating data (it collides quite frequently if there is). I don't know of any C# ports of Murmur3.
Murmur3 implementation blog.teamleadnet.com/2012/08/…
2

Any of the crypto hashing stuff should work. Not sure about the speed. Perhaps MD5?

3 Comments

are there any custom methods in .NET for byte[] array quick comparison only, I do not need encryption yet
@Chesnokov that sounds like a different question; like: stackoverflow.com/questions/43289/…
oh, no. I need a fast method to get 32 bit value for byte[] array. The serialized object is sent along with its hash to other machine where hash is recomputed and compared

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.