Customized hash function for Python

Question

I would like to generate a human-readable hash with customized properties -- e.g., a short string of specified length consisting entirely of upper case letters and digits excluding 0, 1, O, and I (to eliminate visual ambiguity):

"arbitrary string"  -->  "E3Y7UM8"

A 7-character string of the above form could take on over 34 billion unique values which, for my purposes, makes collisions extremely unlikely. Security is also not a major concern.

Is there an existing module or routine that implements something like the above? Alternatively, can someone suggest a straightforward algorithm?

Oh, man, hashing might be difficult to implement. You know, committees consisting of many people work hard to invent a new hashing algorithm. — ForceBru
– ForceBru, Commented Mar 16, 2016 at 17:30
The best solution might be to use an existing hash, then implement a simple transformation into a form you like. — kindall
– kindall, Commented Mar 16, 2016 at 17:33
@ForceBru: Are you thinking of cryptographic hashes? Those take a lot of specialized expertise to do right, but non-cryptographic hashes are pretty straightforward. Also, there's no requirement that this hash be in any way a new invention. — user2357112
– user2357112, Commented Mar 16, 2016 at 17:34
What's the problem you're actually trying to solve with this? — jonrsharpe
– jonrsharpe, Commented Mar 16, 2016 at 17:38
I want to generate unique, human-transcribable referral codes. Something a customer could enter to claim a discount or similar. And I want them to be (effectively) uniquely tied to the email address of the referrer. — Grant Petty
– Grant Petty, Commented Mar 16, 2016 at 17:41

gschizas · Accepted Answer · 2018-03-08 08:11:02Z

2

The method you should be using has similarities with password one-way encryption. Of course since you are going for readable, a good password function is probably out of the question.

Here's what I would do:

Take an MD5 hash of the email
Convert base32 which already eliminates O and I
Replace any non-readable characters with readable ones

Here's an example based on the above:

 import base64 # base32 is a function in base64
 import hashlib

 email = "[email protected]"

 md5 = hashlib.md5()
 md5.update(email.encode('utf-8'))

 hash_in_bytes = md5.digest()

 result = base64.b32encode(hash_in_bytes)

 print(result)

 # Or you can remove the extra "=" at the end

 result = result.strip(b'=')

Since it's a one-way function (hash), you obviously don't need to worry about reversing the process (you can't anyway). You can also replace any other characters you find non-readable with readable ones (I would go for lowercase versions of the characters, e.g. q instead of Q)

More about base32 here: https://docs.python.org/3/library/base64.html

edited Mar 8, 2018 at 8:11

answered Mar 16, 2016 at 18:00

gschizas

902 silver badges9 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Grant Petty Over a year ago

With the added step of truncating the result to fewer (e.g. 7) characters, the above solution does what I need very nicely.

perror · Accepted Answer · 2016-03-16 18:27:17Z

2

You can simply truncate the beginning of an MD5sum algorithm. It should have approximately the same statistical properties than the whole string anyway:

import md5
m = md5.new()
m.update("arbitrary string")
print(m.hexdigest()[:7])

Same code with hashlib module:

import hashlib
m = hashlib.md5()
m.update("arbitrary string")
print(m.hexdigest()[:7])

edited Mar 16, 2016 at 18:27

answered Mar 16, 2016 at 17:35

perror

7,51616 gold badges63 silver badges89 bronze badges

Collectives™ on Stack Overflow

Customized hash function for Python

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related