17

I'm trying to choose a hash algorithm for comparing about max 20 different text data.

Which hash is better for these requirements?

  • Less CPU Consumption
  • Small footprint (<=32 bytes)
  • Collision is not a big deal
  • Can be generated from .NET Framework 2 (shouldn't be a 3rd party library)

I'm using hash for less memory footprint and comparison performance

4
  • Can you elaborate on what you mean by "Can be generated from .NET Framework 2", do you mean something that already exists in the BCL or is something that can easily be implemented yourself acceptable? Commented Dec 21, 2008 at 19:34
  • Can you clarify "Can be generated from .NET Framework 2 (shouldn't be a 3rd party library)" Do you mean "the hash itself must be generated from an algorithm that exists in the framework" or "the Algorithm must be generated from types in the framework"? Commented Dec 21, 2008 at 19:35
  • I mean a native function or library instead of an external huge project or DLL dependency. Commented Dec 21, 2008 at 22:14
  • 30
    Have you considered using one or more of the following general purpose hash functions: partow.net/programming/hashfunctions/index.html they are extremely fast and efficient. Commented Nov 5, 2009 at 8:38

8 Answers 8

10

If collision is not a big deal you can take the first letter of each document. Or you can use the length of the text or the string with the text.

Sign up to request clarification or add additional context in comments.

2 Comments

Or a combination of the first (few) character(s) and length. Of course, you can use the entire string as the hash. =]
this sounds reasonable actually.
7

Paul Hsieh has a decent, simple, fast, 32-bit SuperFastHash that performs better than most existing hash functions, is easier to understand/implement, and sounds like it meets your criteria.

3 Comments

I've used this myself for hash maps, and it works very well (little collision) in my cases.
Yeah, I heard this guy is some kind of genius or something. His hash function is awesome-sauce! :)
@paul: Nah he's not a genius, just really good at self-marketing :)
4

The FNV hash is a well-known fast hashing algorithm. It is not cryptographically secure, but it sounds like you don't need a secure hash.

2 Comments

Is there an FNV implementation in the BCL?
There is a BCL team blog entry about this very question: blogs.msdn.com/bclteam/archive/2003/10/31/49719.aspx
2

A very quick check would be to take the length of a text and XOR it with the first 4 bytes of it and use that as a hash. If this is good enough it is extremely fast because independent of the number of bytes of the file.

Comments

0

Check out the serie Peter Karkowski published on his blog.

Comments

0

If you are constrained to algorithms that exist in the framework

Is MD5 small enough (16 bytes)?

Less CPU Consumption and Small footprint are usually mutually exclusive.

http://en.wikipedia.org/wiki/Time-space_tradeoff

3 Comments

It's small but CPU usage is not that efficient, I'd rather use MD4 (as a non-brainer) or CRC32 alike stuff.
That's right. I was confused because it's usually displayed as a 32 digit hex number.
AFAIK neither MD4 or CRC32 are implemented in the BCL. If don't want to roll your own or use a 3rd party, just pay 15% tax for the convenience of MD5
0

How long does the hash need to hold for? GetHashCode() is pretty accessible, gives a small response (4 bytes), which should be fine (re minimizing collisions) over 20 strings.

However, GetHashCode() should not be persisted to database - it is fine for in-memory comparisons, though. Just be aware that the algorithm may change between frameworks (and did between 1.1 and 2.0).

The other advantage of this is that it is trivial to use - just use a Dictionary<string,Something>, which will deal with all the hashing etc for you.

2 Comments

If I use dictionary<string,smt> I assume it'll store the whole string in the memory.
Yes, but hashes always have a risk of collision if you don't follow up with an actual equality check.
0

I had the same request for myselve and i implemented xxHashSharp . Just make sure you take the appropriate library ( x32 vs x64). It's also available outside of c# here

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.