4

Note 1: None of the answers given to this question work in my case.

Note 2: The solution must work in NumPy 1.14.

Assume I have the following structured array:

arr = np.array([(105.0, 34.0, 145.0, 217.0)], dtype=[('a', 'f4'), ('b', 'f4'), ('c', 'f4'), ('d', 'f4')]).

Now I'm slicing into the structured data type like so:

arr2 = arr[['a', 'b']]

And now I'm trying to convert that slice into a regular array:

out = arr2[0].view((np.float32, 2))

which results in

ValueError: Changing the dtype of a 0d array is only supported if the itemsize is unchanged

What I would like to get is just a regular array like so:

[105.0, 34.0]

Note that this example is simplified in order to be minimal. In my real use case I'm obviously not dealing with an array that holds one element.

I know that this solution works:

out = np.asarray(list(arr2[0]))

but I thought there must be a more efficient solution than copying data that is already in a NumPy array into a list and then back into an array. I assume there is a way to stay in NumPy an maybe not actually copy any data at all, I just don't know how.

9
  • 1
    Are you looking for np.array(arr[0].tolist())? Commented Apr 25, 2018 at 17:40
  • @pault yes, but going from structured array to a list and then to an array not an efficient solution :/ Commented Apr 25, 2018 at 17:41
  • 2
    I can't reproduce your error. arr[0].view((np.float32, len(arr.dtype.names))) works for me. Commented Apr 25, 2018 at 17:48
  • adding to @pault, i can't recreate the error either and my numpy version is 1.11 Commented Apr 25, 2018 at 17:51
  • @pault which Numpy version are you on? Some behavior around structured arrays changed with 1.14, which is what I'm on Commented Apr 25, 2018 at 17:52

2 Answers 2

3

The 1d array does convert with view:

In [270]: arr = np.array([(105.0, 34.0, 145.0, 217.0)], dtype=[('a', 'f4'), ('b','f4'), ('c', 'f4'), ('d', 'f4')])
In [271]: arr
Out[271]: 
array([(105., 34., 145., 217.)],
      dtype=[('a', '<f4'), ('b', '<f4'), ('c', '<f4'), ('d', '<f4')])
In [272]: arr.view('<f4')
Out[272]: array([105.,  34., 145., 217.], dtype=float32)

It's when we try to convert a single element, that we get this error:

In [273]: arr[0].view('<f4')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-273-70fbab8f61ba> in <module>()
----> 1 arr[0].view('<f4')

ValueError: Changing the dtype of a 0d array is only supported if the itemsize is unchanged

Earlier view often required a tweak in the dimensions. I suspect that with recent changes to handling of structured arrays (most evident when indexing several fields at once), this error is a result, either intentionally or not.

In the whole array case it changed the 1d, 4 field array into a 1d, 4 element array, (1,) to (4,). But changing the element, goes from () to (4,).

In the past I have recommended tolist as the surest way around problem with view (and astype):

In [274]: arr[0].tolist()
Out[274]: (105.0, 34.0, 145.0, 217.0)
In [279]: list(arr[0].tolist())
Out[279]: [105.0, 34.0, 145.0, 217.0]
In [280]: np.array(arr[0].tolist())
Out[280]: array([105.,  34., 145., 217.])

item is also a good way of pulling an element out of its numpy structure:

In [281]: arr[0].item()
Out[281]: (105.0, 34.0, 145.0, 217.0)

The result from tolost and item is a tuple.

You worry about speed. But you are just converting one element. It's one thing to worry about the speed when using tolist on a 1000 item array, quite another when working with 1 element.

In [283]: timeit arr[0]
131 ns ± 1.31 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
In [284]: timeit arr[0].tolist()
1.25 µs ± 11.9 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [285]: timeit arr[0].item()
1.27 µs ± 2.39 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [286]: timeit arr.tolist()
493 ns ± 17.2 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [287]: timeit arr.view('f4')
1.74 µs ± 18.7 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

You could index the element in a way that doesn't reduce the dimension to 0 (not that it helps much with speed):

In [288]: arr[[0]].view('f4')
Out[288]: array([105.,  34., 145., 217.], dtype=float32)
In [289]: timeit arr[[0]].view('f4')
6.54 µs ± 15.9 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [290]: timeit arr[0:1].view('f4')
2.63 µs ± 105 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [298]: timeit arr[0][None].view('f4')
4.28 µs ± 160 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

view still requires a change in shape; consider a big array:

In [299]: arrs = np.repeat(arr, 10000)
In [301]: arrs.view('f4')
Out[301]: array([105.,  34., 145., ...,  34., 145., 217.], dtype=float32)
In [303]: arrs.shape
Out[303]: (10000,)
In [304]: arrs.view('f4').shape
Out[304]: (40000,)

The view is still 1d, where as we'd probably want a (10000,4) shaped 2d array.

A better view change:

In [306]: arrs.view(('f4',4))
Out[306]: 
array([[105.,  34., 145., 217.],
       [105.,  34., 145., 217.],
       [105.,  34., 145., 217.],
       ...,
       [105.,  34., 145., 217.],
       [105.,  34., 145., 217.],
       [105.,  34., 145., 217.]], dtype=float32)
In [307]: _.shape
Out[307]: (10000, 4)

This works with the 1 element array, whether 1d or 0d:

In [308]: arr.view(('f4',4))
Out[308]: array([[105.,  34., 145., 217.]], dtype=float32)
In [309]: _.shape
Out[309]: (1, 4)
In [310]: arr[0].view(('f4',4))
Out[310]: array([105.,  34., 145., 217.], dtype=float32)
In [311]: _.shape
Out[311]: (4,)

This was suggested in one of the answers in your link: https://stackoverflow.com/a/10171321/901925

Contrary to your comment there, it works for me:

In [312]: arr[0].view((np.float32, len(arr.dtype.names)))
Out[312]: array([105.,  34., 145., 217.], dtype=float32)
In [313]: np.__version__
Out[313]: '1.14.0'

With the edit:

In [84]: arr = np.array([(105.0, 34.0, 145.0, 217.0)], dtype=[('a', 'f4'), ('b','f4'), ('c', 'f4'), ('d', 'f4')])
In [85]: arr2 = arr[['a', 'b']]
In [86]: arr2
Out[86]: 
array([(105., 34.)],
      dtype={'names':['a','b'], 'formats':['<f4','<f4'], 'offsets':[0,4], 'itemsize':16})

In [87]: arr2.view(('f4',2))
...
ValueError: Changing the dtype to a subarray type is only supported if the total itemsize is unchanged

Note that the arr2 dtype includes an offsets value. In a recent numpy version, multiple field selection has changed. It is now a true view, preserving the original data - all of it, not just the selected fields. The itemsize is unchanged:

In [93]: arr.itemsize
Out[93]: 16
In [94]: arr2.itemsize
Out[94]: 16

arr.view(('f4',4) and arr2.view(('f4',4)) produce the same thing.

So you can't view (change dtype) a partial set of the fields. You have to first take the view of the whole array, and then select rows/columns, or work with tolist.

I'm using 1.14.0. Release notes for 1.14.1 says:

The change in 1.14.0 that multi-field indexing of structured arrays returns a view instead of a copy has been reverted but remains on track for NumPy 1.15. Affected users should read the 1.14.1 Numpy User Guide section "basics/structured arrays/accessing multiple fields" for advice on how to manage this transition.

https://docs.scipy.org/doc/numpy-1.14.2/user/basics.rec.html#accessing-multiple-fields

This is still under development. That doc mentions a repack_fields function, but that doesn't exist yet.

Sign up to request clarification or add additional context in comments.

4 Comments

Thank you so much for your answer! I've realized that I didn't pose my problem correctly. You are right, the solution in my OP does indeed not result in an error in my given example. I posed the example incorrectly. Very sorry for that! I updated the example with a slight modification. Now it both represents my actual use case and unfortunately definitely throws the error :/
Due to recent changes in mutlifield selection, you can't do that.
Thanks a lot for the edited answer, and thanks for pointing me towards the relevant documentation! If I should have looked more thoroughly. I guess I'll resort to converting to a list for now and then see what NumPy 1.15 brings.
Version 1.17 implements the new multi field view approach. Read the docs and release notes.
1

This is now possible with numpy.lib.recfunctions.structured_to_unstructured

With your example:

import numpy as np
from numpy.lib import recfunctions as rfn

arr = np.array([(105.0, 34.0, 145.0, 217.0)], dtype=[('a', 'f4'), ('b', 'f4'), ('c', 'f4'), ('d', 'f4')])

out = rfn.structured_to_unstructured(arr[['a', 'b']])
print(repr(out))  # array([[105.,  34.]], dtype=float32)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.