3

I'm using numpy's fromfile function to read data from a binary file. The file contains a sequence of values (3 * float32, 3 * int8, 3 * float32) which I want to extract into a numpy ndarray with (rows, 9) shape.

with open('file/path', 'rb') as my_file:
    my_dtype = np.dtype('>f4, >f4, >f4, >i1, >i1, >i1, >f4, >f4, >f4' )
    my_array = np.fromfile( my_file, dtype = my_dtype )

    print(my_array.shape)
    print(type(my_array[0]))
    print(my_array[0])

And this returns:

(38475732,)
<type 'numpy.void'>
(-775.0602416992188, -71.0, -242.5240020751953, 39, 39, 39, 5.0, 2753.0, 15328.0)
  1. How can I get a 2 dimensional ndarray with shape (38475732, 9,)?

  2. Why the returned tuple is of type 'numpy.void'?

Redefining question:

If all the values that I want to read from the file were, for example, 4 byte floats I would use np.dtype('9>f4') and I would get what I need. But, as my binary file contains different types, is there a way of casting all the values into 32bit floats?

PS: I can do this using 'struct' to parse the binary file into a list and converting this list into an ndarray afterwards, but this method is much slower than using np.fromfile

Solution:

Thanks Hpaulj for your answer! What I did in my code was to add the following line to do the conversion from the recarray returned by the numpy fromfile function to the expected ndarray:

my_array = my_array.astype('f4, f4, f4, f4, f4, f4, f4, f4, f4').view(dtype='f4').reshape(my_array.shape[0], 9)

Which returns a (38475732, 9) ndarray

Cheers!

4
  • what's the value of sys.byteorder? Commented Nov 26, 2013 at 6:10
  • it returns 'little' but I don't see why is this important... Commented Nov 26, 2013 at 6:20
  • You are using '>' which is for big endian. Did you try using little ending data type which is '<'? Commented Nov 26, 2013 at 6:29
  • The values I get inside the tuple are correct. The problem is rather in defining a dtype expression which maps into a 2d ndarray Commented Nov 26, 2013 at 6:33

2 Answers 2

2

What is my_array[[0]]? my_array is a 1d array of records defined by my_dtype.

my_array[0] is one of those records, a tuple. Notice that some entries are float, some integers. If it was a row of a 2d array, all entries would be of the same type (e.g. float).

To convert it to a 2d array of floats, you might try:

np.array(my_array.tolist())

Another way is to convert all the fields to the same type, and reshape it. Something along this line (tested on a different recarray):

x = array([(1.0, 2), (3.0, 4)], dtype=[('x', '<f8'), ('y', '<i4')])
x.astype([('x', '<f8'), ('y', '<f8')]).view(dtype='f8').reshape(2,2)

See also: How to convert numpy.recarray to numpy.array?

Sign up to request clarification or add additional context in comments.

1 Comment

my_array[[0]] = [(-775.0602416992188, -71.0, -242.5240020751953, 39, 39, 39, 5.0, 2753.0, 15328.0)]
0

Since you require your array to contain different datatypese, you get a structured array, where each element is a record. You can access fields with

>>> my_array.dtype.names
('f0', 'f1', 'f2', 'f3', 'f4', 'f5', 'f6', 'f7', 'f8')
>>> my_array[0]['f1']
-71.0
>>> my_array['f1']
array([-71.], dtype=float32)

A basic ndarray contains elements of same type, if you need a ndarray with shape (38475732, 9,), you have to convert your array to, say, floats. See link above.

Can't say exactly why (didn't use structured arrays much), but reason for numpy.void is that your custom type, known to array, is not broadcasted to records. But what would be type of subrecord?

>>> arr[['f0','f1']][0]
(-775.0602416992188, -71.0)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.