What does GetHashCode() calculate when invoked on the byte[] array?
The 2 data arrays with equal content do not provide the same hash.
6 Answers
Arrays in .NET don't override Equals or GetHashCode, so the value you'll get is basically based on reference equality (i.e. the default implementation in Object) - for value equality you'll need to roll your own code (or find some from a third party). You may want to implement IEqualityComparer<byte[]> if you're trying to use byte arrays as keys in a dictionary etc.
EDIT: Here's a reusable array equality comparer which should be fine so long as the array element handles equality appropriately. Note that you mustn't mutate the array after using it as a key in a dictionary, otherwise you won't be able to find it again - even with the same reference.
using System;
using System.Collections.Generic;
public sealed class ArrayEqualityComparer<T> : IEqualityComparer<T[]>
{
// You could make this a per-instance field with a constructor parameter
private static readonly EqualityComparer<T> elementComparer
= EqualityComparer<T>.Default;
public bool Equals(T[] first, T[] second)
{
if (first == second)
{
return true;
}
if (first == null || second == null)
{
return false;
}
if (first.Length != second.Length)
{
return false;
}
for (int i = 0; i < first.Length; i++)
{
if (!elementComparer.Equals(first[i], second[i]))
{
return false;
}
}
return true;
}
public int GetHashCode(T[] array)
{
unchecked
{
if (array == null)
{
return 0;
}
int hash = 17;
foreach (T element in array)
{
hash = hash * 31 + elementComparer.GetHashCode(element);
}
return hash;
}
}
}
class Test
{
static void Main()
{
byte[] x = { 1, 2, 3 };
byte[] y = { 1, 2, 3 };
byte[] z = { 4, 5, 6 };
var comparer = new ArrayEqualityComparer<byte>();
Console.WriteLine(comparer.GetHashCode(x));
Console.WriteLine(comparer.GetHashCode(y));
Console.WriteLine(comparer.GetHashCode(z));
Console.WriteLine(comparer.Equals(x, y));
Console.WriteLine(comparer.Equals(x, z));
}
}
6 Comments
GetHashCode should scan over the entire sequence. Interestingly, the internal implementation for Array.IStructuralEquatable.GetHashCode only considers the last eight items of an array, sacrificing hash uniqueness for speed.SequenceEqual is optimized to compare lengths first if the source implements appropriate interfaces.Memory<T>, Span<T> or Sequence<T> can this code be optimised in any way? For example we do have SequenceEqual for ReadOnlySpan<T> now.Like other non-primitive built-in types, it just returns something arbitrary. It definitely doesn't try to hash the contents of the array. See this answer.
Comments
Simple solution
public static int GetHashFromBytes(byte[] bytes)
{
return new BigInteger(bytes).GetHashCode();
}
6 Comments
(2^32) in 1 chance of collision, which is negalegible for most scenarios but is something that must be kept in mind whenever there's a hash code.If you are using .NET 6 or at least .NET Core 2.1, you can write less code and achieve better performance with the System.HashCode struct.
Using the method HashCode.AddBytes() which is available from .NET 6:
public int GetHashCode(byte[] value)
{
var hash = new HashCode();
hash.AddBytes(value);
return hash.ToHashCode();
}
Using the method HashCode.Add which is available from .NET Core 2.1:
public int GetHashCode(byte[] value) =>
value.Aggregate(new HashCode(), (hash, i) => {
hash.Add(i);
return hash;
}).ToHashCode();
Note that in the documentation of HashCode.AddBytes() it says:
This method does not guarantee that the result of adding a span of bytes will match the result of adding the same bytes individually.
In this sharplab demo both output the same result, but this might vary with .NET version or runtime environment.
Comments
If it's not the same instance, it will return different hashes. I'm guessing it is based on the memory address where it is stored somehow.
IStructuralEquatablerequires use of the non-genericIEqualityComparerinterface, which means that eachbytein the array is going to be boxed during computation of the hashcode.