1

Question

What's the best way to implement a function that given an object is able to return a hash key?

The requirements would be:

  • HashCodeFn(((bool?)false, "example")) != HashCodeFn(((bool?)null, "example"))
  • Is relatively cheap to calculate
  • Works with any type without any specific requirement (e.g. [Serializable] attribute)

What I've tried

I've tried with .GetHashCode but:

  • It's unreliable for things like null vs 0 vs false
  • Needs to be implemented for every type

I've tried with:

    private static int GetHashKey<T>(T input)
    {
        using var memoryStream = new MemoryStream();
        BinaryFormatter formatter = new BinaryFormatter();
        formatter.Serialize(memoryStream, input);
        memoryStream.Position = 0;
        using var reader = new StreamReader(memoryStream);
        return reader.ReadToEnd().GetHashCode();
    }

but:

  • It requires that all types in the object tree implement [Serializable] (some types I have no control over and don't implement those)

I'm thinking of serialising the object to JSON in the most compact form and then get the GetHashCode of that string, but I'm not sure how well it works with something like NodaTime.Instant. Is that the fastest way to accomplish this?


Specific use case

This is used as a data loader key (see github.com/graphql/dataloader for an example) if that is of any help to understand the use case.

Specifically a data loader key is used to handle batching. When you have many requests with input (a, b, c) and you want to "pivot" on, for example, a (which means that (1, b, c), (2, b, c), (3, b, c) should call a batch function fn([1, 2, 3], (b, c)) then you need to be able to define a key that is the same for the same values of (b, c) to use as the data loader key.

From the input perspective, specifying or not a bool on something like b, for example, is considered to be 2 different things and should be batched on two different functions.

If I were to use (b, c).GetHashCode() then I it would consider ((bool?)false, "ok") and ((bool?)null, "ok") to be the same thing, therefore batching them to the same batch function yielding unexpected results.

12
  • Does this answer your question? What is the best algorithm for overriding GetHashCode? Commented Jan 16, 2020 at 16:35
  • @PavelAnikhouski Doesn't seem so because it relies on .GetHashCode for inner fields which doesn't fit the requirements. For example 0.GetHashCode() and false.GetHashCode() yield the same value. Commented Jan 16, 2020 at 16:40
  • What do you mean GetHashCode is "unreliable for things like null vs 0 vs false"? Commented Jan 16, 2020 at 16:44
  • For example 0.GetHashCode() and false.GetHashCode() yield the same value and therefore don't satisfy the first requirement. Commented Jan 16, 2020 at 16:45
  • 1
    @ShoeDiamente I am asking why that is a requirement. Why is 0 hashing to the same thing as false a problem? Commented Jan 16, 2020 at 16:45

1 Answer 1

1

I don't think there's any particularly efficient way to do what you want. Some sort of additional processing will be required to make sure you're getting appropriate hash codes. Also, keep in mind that if the classes you don't control already implement Equals and GetHashCode and Equals returns true for instance that differ only by something like a nullable boolean being false or null, then it's incorrect for GetHashCode to return different values.

You could serialise to JSON to achieve what you want. That will exclude any fields that might happen to be annotated for exclusion. Assuming none of the fields relevant to a hash code are excluded then that'll work. Alternatively you could write extension functions for the types that are going to cause clashes and customise the hashing for those fields. Then use reflection (which is likely to be used in serialising to JSON as well) to iterate over class members and get hash codes using your extensions where necessary. Something along the lines of the code below.

class ThingToHash
{
    public bool? CouldBeFalseOrNullOrNull { get; }
    public int IncludesZero { get; }
    public string CanBeEmptyOrNull { get; }
    private string Hidden { get; }

    public ThingToHash(bool? couldBeFalseOrNull, int includesZero, string canBeEmptyOrNull)
    {
        CouldBeFalseOrNullOrNull = couldBeFalseOrNull;
        IncludesZero = includesZero;
        CanBeEmptyOrNull = canBeEmptyOrNull;
    }
}

static class StringExtensions
{
    public static int GetAltHashCode(this string toHash)
    {
        return toHash?.GetHashCode() ?? 17;
    }
}

static class NullableBoolExtensions
{
    public static int GetAltHashCode(this bool? toHash)
    {
        return toHash?.GetAltHashCode() ?? true.GetHashCode() * 19;
    }
}

static class BoolExtensions
{
    public static int GetAltHashCode(this bool toHash)
    {
        if (false == toHash)
        {
            return true.GetHashCode() * 17;
        }

        return toHash.GetHashCode();
    }
}

class Program
{
    static void Main(string[] args)
    {
        Console.WriteLine(false.GetHashCode());
        Console.WriteLine(((bool?)null).GetHashCode());
        Console.WriteLine(false == (bool?)null);

        Console.WriteLine(HashUnknownObject(new ThingToHash(null, 0, "")));
        Console.WriteLine(HashUnknownObject(new ThingToHash(false, 0, "")));

        Console.ReadKey();
    }

    static int HashUnknownObject(Object toHash)
    {
        PropertyInfo[] members = toHash.GetType().GetProperties(BindingFlags.Instance | BindingFlags.NonPublic | BindingFlags.Public);
        int hash = 17;

        foreach (PropertyInfo memberToHash in members)
        {
            object memberVal = memberToHash.GetValue(toHash);

            if (null == memberVal)
            {
                if (typeof(bool?) == memberToHash.PropertyType)
                {
                    hash += 31 * ((bool?)null).GetAltHashCode();
                }
                else if (typeof(string) == memberToHash.PropertyType)
                {
                    hash += 31 * ((string)null).GetAltHashCode();
                }
            }
            else
            {
                hash += 31 * memberToHash.GetValue(toHash).GetHashCode();
            }
        }

        return hash;
    }
}

You'd obviously have to add other checks to use the bool extension, add other extensions and so on to cover the cases you need. And do testing to check the impact of using reflection to serialise. You could reduce that for classes that already implement GetHashCode, for instance, by not generating hash codes per member for those.

And this code can obviously be cleaned up. It's just quick and dirty here.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.