0

I need a hash function to take a sequence of decimal numbers and return a decimal number as hash value.

for example:

>> def my_simple_hash(*args):
    return reduce(lambda x1, x2: 2*x1 + x2,  args) 

>>> my_simple_hash(1,3,4)
14
>>> my_simple_hash(1,4,3)
15
>>> my_simple_hash(4,3,1)
23

My questions are:

  1. does python has a built-in lib to do this more efficiently?
  2. how could I make the output hash value in a relative small range?

Question 2 explanation:

because 1, 3, 4 has six different combinations as following:

1,3,4
1,4,3
3,1,4
3,4,1
4,1,3
4,3,1

the corresponding output is [14, 15, 18, 21, 21, 23], and I expect the hash values of the six output would be something like [1,2,3,4,6](a small range)

any suggestions would be appreciated. thanks in advance :-)

4
  • I would generally advise not to roll your own, and instead go with a hash that’s virtually guaranteed not to have collisions (hashlib.sha256()). Commented Sep 3, 2016 at 10:38
  • That strongly depends on what you want - sha256 is a cryptographic hash function that is relatively slow to calculate, takes bytes-input and produces bytes output. good non-cryptographic hashes are fast to compute, have an even distribution of resulting values and not too many collisions. Commented Sep 3, 2016 at 11:08
  • What do you intend to do with this hash? How large is the domain of the input integers, what's a typical size of an input sequence, and how large should the range of the output be? How important is collision avoidance? Your needs may be satisfied by a variant of format-preserving encryption, either a full cryptographic-strength implementation, or a faster, simpler version. Commented Sep 3, 2016 at 11:08
  • @PM2Ring, I want to use a single decimal num to identify a sequence of numbers. Commented Sep 3, 2016 at 11:36

1 Answer 1

1

If you just want to hash a number sequence you can do

def my_hash(*args):
    return hash(args)

which returns the hash (for the current run of the program) of the args-tuple (hash is fast and well tested for builtin-types) - but this is still most often a large number.

For getting a smaller value you can take the modulo like

def my_hash(*args):
    return hash(args)%10 # or whatever number you like

Actually you could also use

def my_hash(*args):
    return sum(args)%10 # or whatever number you like

which doesnt change between runs of the program but sum does not distribute the results evenly at all.

Warning: These are not cryptographical hashes

Sign up to request clarification or add additional context in comments.

5 Comments

hi, @janbrohl, does python2 have a similar function like hash?
Yes (same name) - It possibly returns different values but it is essentially the same
Most stuff didnt change (much) from Python 2.7 to 3.x and simple programs work on both requiring ony minimal changes (or none at all)
I marked the answer as accepted, but for the second question, the output will be in range 0 to 9 if I use %10 and thus cause collisions. I will use a dict to map every hash values to my desire numbers.
for less collisions ditribute the hashes over a larger range of numbers by using bigger values for %-ing than 10

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.