1

I am trying to implement a custom hashing method (multiplication method):

def h(k, M):
    y = hash(k)
    T = (math.sqrt(5)-1)/2
    return(int(M*((y*T) - int(y * T))))

It always returns zero. I tested it and (y*T) -> returns floating value (for example 10,666666). int(y * T) -> returns integer value (for example 10). But if i do (y*T) - int(y * T) it always returns 0.0. My target is to call something like h('test', 10) and get a number as return, but it always returns 0.0. Why is it so?

7
  • 1
    seems like M is undefined Commented Oct 29, 2013 at 21:13
  • 1
    You forgot a : after h(k) Commented Oct 29, 2013 at 21:13
  • Your question still remains unclear. Commented Oct 29, 2013 at 21:17
  • When I run this code and do h('test',10) it outputs 7, not 0. Commented Oct 29, 2013 at 21:26
  • wierd, is use Python 3.3.2 and doing h('test',10) returns always 0. Hank Ditton, which version did you use? can it be python version related? Commented Oct 29, 2013 at 21:28

2 Answers 2

4

Are you running on a 64-bit system? If so, y will be a 64-bit integer, and T is about 0.6, so, for example,

>>> import random
>>> y = random.randrange(2**64) # some 64-bit int
>>> y
17364376918466400468
>>> yt = y * 0.6
>>> yt
1.041862615107984e+19
>>> yt - int(yt)
0.0

A float only has 53 bits of precision, so the odds greatly favor that there will be no bits "after the decimal point" when converting a 64-bit int to a float.

On a 32-bit system, hash() returns 32-bit ints, so this problem won't appear.

If this is the problem, then you could try various workarounds, like adding:

y = abs(y)
y = (y >> 32) ^ (y & 0xffffffff)  # collapse to 32 bits
Sign up to request clarification or add additional context in comments.

1 Comment

yes, it was the problem! Adding the workaround fixed it. Big thanks!
1

This problem stems down to the way floating point numbers are stored in a computer.

In short: They are stored as a limited amount of significant figures, a base, and an exponent. The computer then knows to scale the significant figures by the base raised to the exponent in order to obtain the value. The amount of data is machine specific: for a 32-bit machine, 23-bits are used for the significant figure, 8 for the exponent, and 1 for the base, a 64-bit machine will have 53 bits for the sig figs, 8 for the exponent, and 1 for the base.

Addition and Subtraction are then done by adding/subtracting the difference between the significant digits and the exponent.

You are generating very large integers for hash(k) and trying to take the difference between the rounded down int(y*T) and the floating point y*T. When the Python interpreter tries to take the difference between a float and an int, it converts the int to a floating point y*T will store a certain number of significant figures. The problem arises when you try to get a low order of magnitude difference from two high order magnitude numbers, or generally anytime the order of magnitude for the difference differs greatly from the numbers involved. The low order significant digits will then be lost in calculation.

Here is the a version I edited to test out your method. The added argument c is a constant which I suspect will help normalize your results.

import math

def h(k,M,c):
    y = hash(k)
    print "hash = ", y
    T = (math.sqrt(5)-1)/(2*c)
    print "y*T = ", y*T
    print "int(y*T) = ", int(y*T)
    print "(y*T) - int(y * T) = ",(y*T) - int(y * T)
    print "M*((y*T) - int(y * T)) = ", M*((y*T) - int(y * T))
    return(int(M*((y*T) - int(y * T))))

print(h('test',2,c))

As you increase c, essentially making the difference in the two numbers occur at a closer and closer order of magnitude, you begin to see that the value for (y*T) - int(y * T) moves away from 0. Sample output is below:

>>>h('test',2,10)
hash =  2314058222102390712
y*T =  1.43016663321e+17
int(y*T) =  143016663320543088
(y*T) - int(y * T) =  0.0
M*((y*T) - int(y * T)) =  0.0
h(test,2,10) =  0
>>>h('test',2,1000)
hash =  2314058222102390712
y*T =  1.43016663321e+15
int(y*T) =  1430166633205430
(y*T) - int(y * T) =  0.75
M*((y*T) - int(y * T)) =  1.5
h(test,2,1000) =  1

>>>h('test',2,10000000)
hash =  2314058222102390712
y*T =  1.43016663321e+11
int(y*T) =  143016663320
(y*T) - int(y * T) =  0.543090820312
M*((y*T) - int(y * T)) =  1.08618164062
h(test,2,10000000) =  1

>>>h('test',2,10000000000000)
hash =  2314058222102390712
y*T =  143016.663321
int(y*T) =  143016
(y*T) - int(y * T) =  0.66332054307
M*((y*T) - int(y * T)) =  1.32664108614
h(test,2,10000000000000) =  1

As an added example of the phenomenon I am talking about:

y = hash('test')
print y
y = float(y)
print y
y = int(y)
print y

outputs:

2314058222102390712
2.3140582221e+18
2314058222102390784

Just by simply switching to a float and back to an int the last two digits are no longer reliable, so it can be seen that anything below this will also be lost.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.