1

Given that the width in bytes for rows in numpy array and the sum width of fields in a structure defined by dtype are the same, is there a simple way to convert such numpy array to a structured array?

For example, my_type defines a data type with 5 bytes per data element in all fields: [('checksum','u2'), ('word', 'B', (3,))]. Then I want to convert the numpy array [[ 1 2 3 4 5] [ 11 12 13 14 15]] to the structured array [( 258, [ 3, 4, 5]) (2828, [13, 14, 15])].

My initial attemp was this:

import numpy as np
from random import randint

# generate data
array = np.array([(1,2,3,4,5), 
    (11,12,13,14,15)], dtype = np.uint8)

# format data
my_type = np.dtype([('checksum','u2'), ('word', 'B', (3,))])
structured_array = np.array([array], dtype=my_type)

But, as expected, because of numpy broadcasting rules, I get the following:

[[[( 1, [ 1,  1,  1]) ( 2, [ 2,  2,  2]) ( 3, [ 3,  3,  3])
   ( 4, [ 4,  4,  4]) ( 5, [ 5,  5,  5])]
  [( 11, [ 11,  11,  11]) (12, [12, 12, 12]) (13, [13, 13, 13])
   (14, [14, 14, 14]) (15, [15, 15, 15])]]]

My current not-so-elegant solution is to loop through the rows of an array and map them to the structure:

structured_array = np.zeros(array.shape[0], dtype=my_type)
for idx, row in enumerate(array):
    for key, value in my_type.fields.items():
        b = row[value[1]:value[1]+value[0].itemsize]
        if len(structured_array[idx][key].shape):
            structured_array[idx][key] = b            
        else:
            structured_array[idx][key] = int.from_bytes(b, byteorder='big', signed=False)

So the question is whether there is a simple, one-line solution to perform this task for an arbitrary data type of a structured array, without parsing bytes of a numpy array?

6
  • the data for a structured array has to be a list of tuples, same layout as in the display. Each tuple is a record or element of the structured array. Commented Oct 12, 2021 at 20:30
  • np.array([(1,[2,3,4])],my_type), The nesting of () and [] is important. Commented Oct 12, 2021 at 20:37
  • recfunctions has a unsteuctured_2_structured function, but I'm note sure it can handle your dtype. That inner (3,) field makes conversion trickier. Commented Oct 12, 2021 at 21:19
  • Another way is to create a zeros array with the desired shape and dtype, and assign values by field. Commented Oct 12, 2021 at 21:20
  • All I wrote should be covered on the main structured array doc page. numpy.org/doc/stable/user/basics.rec.html Commented Oct 12, 2021 at 21:22

1 Answer 1

2
In [222]: x = np.array([[ 0,  2,  3,  4,  5], [ 0, 12, 13, 14, 15]])
In [223]: dt = np.dtype([('checksum','u2'), ('word', 'B', (3,))])

I know from past use, the genfromtxt can handle relatively complex dtypes:

In [224]: np.savetxt('temp', x[:,1:], fmt='%d')
In [225]: cat temp
2 3 4 5
12 13 14 15
In [226]: data = np.genfromtxt('temp', dtype=dt)
In [227]: data
Out[227]: 
array([( 2, [ 3,  4,  5]), (12, [13, 14, 15])],
      dtype=[('checksum', '<u2'), ('word', 'u1', (3,))])

But I haven't dug into its code to see how it maps the flat row data on to the dtypes.

But it turns out the unstructured_to_structured that I mentioned in a comment can handle your dtype:

In [228]: import numpy.lib.recfunctions as rf
In [229]: rf.unstructured_to_structured(x[:,1:],dtype=dt)
Out[229]: 
array([( 2, [ 3,  4,  5]), (12, [13, 14, 15])],
      dtype=[('checksum', '<u2'), ('word', 'u1', (3,))])

But for simpler dtype, I and others have often recommended turning the list of lists into a list of tuples.

In [230]: [tuple(row) for row in x[:,1:]]
Out[230]: [(2, 3, 4, 5), (12, 13, 14, 15)]

Many of the recfunctions use a field-by-field copy

In [231]: res = np.zeros(2, dtype=dt)
In [232]: res
Out[232]: 
array([(0, [0, 0, 0]), (0, [0, 0, 0])],
      dtype=[('checksum', '<u2'), ('word', 'u1', (3,))])
In [233]: res['checksum']= x[:,1]
In [234]: res['word']
Out[234]: 
array([[0, 0, 0],
       [0, 0, 0]], dtype=uint8)
In [235]: res['word'] = x[:,2:]
In [236]: res
Out[236]: 
array([( 2, [ 3,  4,  5]), (12, [13, 14, 15])],
      dtype=[('checksum', '<u2'), ('word', 'u1', (3,))])

byte view

I missed the fact that you wanted to repack bytes. My above answer treats the input line as 4 numbers/ints that will be assigned to the 4 slots in the compound dtype. But with uint8 input, and u2 and u1 slots, you want to view the 5 bytes with the new dtype, not make a new array.

In [332]: dt
Out[332]: dtype([('checksum', '<u2'), ('word', 'u1', (3,))])
In [333]: arr = np.array([(1,2,3,4,5),
     ...:     (11,12,13,14,15)], dtype = np.uint8)
In [334]: arr.view(dt)
Out[334]: 
array([[( 513, [ 3,  4,  5])],
       [(3083, [13, 14, 15])]],
      dtype=[('checksum', '<u2'), ('word', 'u1', (3,))])

view adds a dimension, that we need to remove:

In [335]: _.shape
Out[335]: (2, 1)
In [336]: arr.view(dt).reshape(2)
Out[336]: 
array([( 513, [ 3,  4,  5]), (3083, [13, 14, 15])],
      dtype=[('checksum', '<u2'), ('word', 'u1', (3,))])

and changing the endedness of the u2 field:

In [337]: dt = np.dtype([('checksum','>u2'), ('word', 'B', (3,))])
In [338]: arr.view(dt).reshape(2)
Out[338]: 
array([( 258, [ 3,  4,  5]), (2828, [13, 14, 15])],
      dtype=[('checksum', '>u2'), ('word', 'u1', (3,))])
Sign up to request clarification or add additional context in comments.

5 Comments

This doesn't seem to work with the full data range, i.e. x[:,:] instead of x[:,1:].
For some reason I'm starting with a (2,5) int array [222]. I don't remember where I copied it from. To convert a (2,5) byte array, all you may need is a view
Could you elaborate a bit? For example, afer changing to proper range at 224: np.savetxt('temp', x[:,:], fmt='%d') the output is [(0, [ 2, 3, 4]) (0, [12, 13, 14])], it misses the second byte of the first field ('checksum','u2'). Also, I have changed the data in the question to better represent, where the problem with the suggested solutions may be.
I added a view based answer.
The view is what I was looking for. Thanks!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.