numpy structured array from arbitrary-level nested dictionary

Question

I have an arbitrary-level nested dictionary that contains field names as keys, and 1-D numpy arrays of the same size as values, e.g.:

d = {'a' : arr1, 'b' : {'b1' : arr2, 'b2' : {'c' : arr3}}}

Is there a simple way to build a numpy structured array from it that reflects the original hierarchy? Also, would be good to preserve field name ordering if OrderedDict is given. The usual np.array, np.asarray, np.rec.array functions do not seem to help.

I still don't really understand what you're trying to do. In what sense are the data 'hierarchical' if each element in arr1 has a corresponding element in arr2 and arr3? How do you want to use the array? — ali_m
– ali_m, Commented Sep 1, 2015 at 13:13
Usual operations, like slicing, comparison, reshaping, copying, etc. Example: arr = somefunc(d); arr[::2]; arr['b']['b2']['c'] — gregorio.bastardo
– gregorio.bastardo, Commented Sep 1, 2015 at 13:44
I would iterate through the keys creating a matching nested dtype. Then create an empty array with this dtype and the common array length. Finally go back through the dictionary copying array values to the appropriate fields. — hpaulj
– hpaulj, Commented Sep 1, 2015 at 14:16

hpaulj · Accepted Answer · 2015-09-01 21:05:49Z

Most generally this can be done in 2 steps. Construct a compound dtype that corresponds to the dictionary layout. Then fill an empty array with the arrays from the dictionary.

Construct a sample dictionary:

In [94]: arr1=np.arange(10)
In [95]: arr2=np.arange(100.,110.)
In [96]: arr3=np.arange(200,210)
In [98]: d={'a':arr1, 'b':{'b1':arr2, 'b2':{'c':arr3}}}

This function constructs the dtype:

def mkdt(d):
    ll = []
    for k,v in d.items():
        if isinstance(v,np.ndarray):
            ll.append((k,v.dtype))
        else:
            ll.append((k,mkdt(v)))
    return ll

In [176]: np.dtype(foo(d))
Out[176]: dtype([('a', '<i4'), ('b', [('b1', '<f8'), ('b2', [('c', '<i4')])])])

This function copies data values from d to A:

def copy_values(d, A):
    if A.dtype.names:
        for n in A.dtype.names:
            copy_values(d[n], A[n])
    else:
        A[:]=d

In [264]: A=np.zeros(d['a'].shape,dt)    
In [265]: copy_values(d,A)
In [266]: A
Out[266]: 
array([(0, (100.0, (200,))), (1, (101.0, (201,))), (2, (102.0, (202,))),
       (3, (103.0, (203,))), (4, (104.0, (204,))), (5, (105.0, (205,))),
       (6, (106.0, (206,))), (7, (107.0, (207,))), (8, (108.0, (208,))),
       (9, (109.0, (209,)))], 
      dtype=[('a', '<i4'), ('b', [('b1', '<f8'), ('b2', [('c', '<i4')])])])

(earlier solution)

Here's an interactive (ipython) session that transfers the data from a dictionary like yours to a structured array.

In [94]: arr1=np.arange(10)
In [95]: arr2=np.arange(100,110)
In [96]: arr3=np.arange(200,210)
In [98]: d={'a':arr1, 'b':{'b1':arr2, 'b2':{'c':arr3}}}

The matching dtype:.

In [100]: dt=np.dtype([('a','i'), ('b', np.dtype([('b1','i'),('b2',np.dtype([('c','i')]))]))])

Make an empty array of the correct size and type, and fill the fields

In [102]: A=np.zeros((10,),dt)    
In [104]: A['a']=d['a']
In [105]: A['b']['b1']=d['b']['b1']
In [106]: A['b']['b2']['c']=d['b']['b2']['c']

In [107]: A
Out[107]: 
array([(0, (100, (200,))), (1, (101, (201,))), (2, (102, (202,))),
       (3, (103, (203,))), (4, (104, (204,))), (5, (105, (205,))),
       (6, (106, (206,))), (7, (107, (207,))), (8, (108, (208,))),
       (9, (109, (209,)))], 
      dtype=[('a', '<i4'), ('b', [('b1', '<i4'), ('b2', [('c', '<i4')])])])

If all fields are the same dtype (here int), this array could also be constructed as a view on a 2d array:

np.column_stack([arr1,arr2,arr3]).view(dt).ravel()

This works because the (10,3) array has the same data buffer layout as the structured array.

from numpy.lib import recfunctions

gives access to some utility functions.

recfunctions.recursive_fill_fields for example can copy data from A to another array of the same dtype (but not from the column_stack. It uses recursion to handle a nested dtype.

In [149]: recfunctions.flatten_descr(dt)
Out[149]: (('a', dtype('int32')), ('b1', dtype('int32')), ('c', dtype('int32')))

flattens your nesting.

In [150]: recfunctions.get_fieldstructure(dt)
Out[150]: {'a': [], 'b': [], 'b1': ['b'], 'b2': ['b'], 'c': ['b', 'b2']}

How these functions handle complex dtypes might be more useful than what they actually do. Look at the code.

thanks, works well :) extended it with getting array size at runtime

Collectives™ on Stack Overflow

numpy structured array from arbitrary-level nested dictionary

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related