70

I'm battling some floating point problems in Pandas read_csv function. In my investigation, I found this:

In [15]: a = 5.9975

In [16]: a
Out[16]: 5.9975

In [17]: np.float64(a)
Out[17]: 5.9974999999999996

Why is builtin float of Python and the np.float64 type from Python giving different results? I thought they were both C++ doubles?

3
  • 8
    Note also that the Pandas read_csv function employs its own super-fast string-to-float conversion that is not correctly rounded. Thus after exporting a value and re-reading it, the recovered value may end up being 1 or 2 ulps different from the original. Commented Nov 24, 2014 at 8:41
  • @MarkDickinson, does this apply to read_excel too? Commented May 23, 2022 at 6:44
  • @Gathide No idea, I'm afraid. Commented May 23, 2022 at 14:26

2 Answers 2

68
>>> numpy.float64(5.9975).hex()
'0x1.7fd70a3d70a3dp+2'
>>> (5.9975).hex()
'0x1.7fd70a3d70a3dp+2'

They are the same number. What differs is the textual representation obtained via by their __repr__ method; the native Python type outputs the minimal digits needed to uniquely distinguish values, while NumPy code before version 1.14.0, released in 2018 didn't try to minimise the number of digits output.

Sign up to request clarification or add additional context in comments.

8 Comments

By representation, you mean the way it is printed to screen?
Via the __repr__() method or its C-level equivalent, yes.
A truly accurate representation would actually be 5.99749999999999960920149533194489777088165283203125, which is the exact decimal value of the 64-bit float you get when you evaluate the float literal 5.9975.
@MarkAmery The max precision a float 64 can reach is close to 10-16 (unit in the last place (ULP), see en.wikipedia.org/wiki/Floating-point_arithmetic) so the idea of an exact decimal value with significantly more than 16 digits for a floating point is misleading.
@JonathanNappee: Every numeric binary64 representation does in fact have an exact decimal equivalent. The trouble occurs when we believe that a much less precise decimal value is represented by a given binary64 value.
|
3

Numpy float64 dtype inherits from Python float, which implements C double internally. You can verify that as follows:

isinstance(np.float64(5.9975), float)   # True

So even if their string representation is different, the values they store are the same.

On the other hand, np.float32 implements C float (which has no analog in pure Python) and no numpy int dtype (np.int32, np.int64 etc.) inherits from Python int because in Python 3 int is unbounded:

isinstance(np.float32(5.9975), float)   # False
isinstance(np.int32(1), int)            # False

So why define np.float64 at all?

np.float64 defines most of the attributes and methods in np.ndarray. From the following code, you can see that np.float64 implements all but 4 methods of np.array:

[m for m in set(dir(np.array([]))) - set(dir(np.float64())) if not m.startswith("_")]

# ['argpartition', 'ctypes', 'partition', 'dot']

So if you have a function that expects to use ndarray methods, you can pass np.float64 to it while float doesn't give you the same.

For example:

def my_cool_function(x):
    return x.sum()

my_cool_function(np.array([1.5, 2]))   # <--- OK
my_cool_function(np.float64(5.9975))   # <--- OK
my_cool_function(5.9975)               # <--- AttributeError

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.