1

I would like to generate a human-readable hash with customized properties -- e.g., a short string of specified length consisting entirely of upper case letters and digits excluding 0, 1, O, and I (to eliminate visual ambiguity):

"arbitrary string"  -->  "E3Y7UM8"

A 7-character string of the above form could take on over 34 billion unique values which, for my purposes, makes collisions extremely unlikely. Security is also not a major concern.

Is there an existing module or routine that implements something like the above? Alternatively, can someone suggest a straightforward algorithm?

7
  • 2
    Oh, man, hashing might be difficult to implement. You know, committees consisting of many people work hard to invent a new hashing algorithm. Commented Mar 16, 2016 at 17:30
  • 2
    The best solution might be to use an existing hash, then implement a simple transformation into a form you like. Commented Mar 16, 2016 at 17:33
  • @ForceBru: Are you thinking of cryptographic hashes? Those take a lot of specialized expertise to do right, but non-cryptographic hashes are pretty straightforward. Also, there's no requirement that this hash be in any way a new invention. Commented Mar 16, 2016 at 17:34
  • What's the problem you're actually trying to solve with this? Commented Mar 16, 2016 at 17:38
  • I want to generate unique, human-transcribable referral codes. Something a customer could enter to claim a discount or similar. And I want them to be (effectively) uniquely tied to the email address of the referrer. Commented Mar 16, 2016 at 17:41

2 Answers 2

2

The method you should be using has similarities with password one-way encryption. Of course since you are going for readable, a good password function is probably out of the question.

Here's what I would do:

  1. Take an MD5 hash of the email
  2. Convert base32 which already eliminates O and I
  3. Replace any non-readable characters with readable ones

Here's an example based on the above:

 import base64 # base32 is a function in base64
 import hashlib

 email = "[email protected]"

 md5 = hashlib.md5()
 md5.update(email.encode('utf-8'))

 hash_in_bytes = md5.digest()

 result = base64.b32encode(hash_in_bytes)

 print(result)

 # Or you can remove the extra "=" at the end

 result = result.strip(b'=')

Since it's a one-way function (hash), you obviously don't need to worry about reversing the process (you can't anyway). You can also replace any other characters you find non-readable with readable ones (I would go for lowercase versions of the characters, e.g. q instead of Q)

More about base32 here: https://docs.python.org/3/library/base64.html

Sign up to request clarification or add additional context in comments.

1 Comment

With the added step of truncating the result to fewer (e.g. 7) characters, the above solution does what I need very nicely.
2

You can simply truncate the beginning of an MD5sum algorithm. It should have approximately the same statistical properties than the whole string anyway:

import md5
m = md5.new()
m.update("arbitrary string")
print(m.hexdigest()[:7])

Same code with hashlib module:

import hashlib
m = hashlib.md5()
m.update("arbitrary string")
print(m.hexdigest()[:7])

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.