4

I have a NumPy record array of floats:

import numpy as np
ar = np.array([(238.03, 238.0, 237.0),
               (238.02, 238.0, 237.01),
               (238.05, 238.01, 237.0)], 
              dtype=[('A', 'f'), ('B', 'f'), ('C', 'f')])

How can I determine min/max from this record array? My usual attempt of ar.min() fails with:

TypeError: cannot perform reduce with flexible type

I'm not sure how to flatten the values out into a simpler NumPy array.

4 Answers 4

5

The easiest and most efficient way is probably to view your array as a simple 2D array of floats:

ar_view = ar.view((ar.dtype[0], len(ar.dtype.names)))

which is a 2D array view on the structured array:

print ar_view.min(axis=0)  # Or whatever…

This method is fast, as no new array is created (changes to ar_view result in changes to ar). It is restricted to cases like yours, though, where all record fields have the same type (float32, here).

One advantage is that this method keeps the 2D structure of the original array intact: you can find the minimum in each "column" (axis=0), for instance.

Sign up to request clarification or add additional context in comments.

4 Comments

I get an Error with float: "ValueError: new type not compatible with array." However, if I use a NumPy float data type like ar.dtype[0] (or dtype('float32')), success!
ar.view((ar.dtype[0], len(ar.dtype)))
I guess now we would use structured_to_unstructured?
This is an interesting comment. Note that structured_to_unstructured creates a new array and is therefore not fully equivalent to this answer (and is slower).
3

you can do

# construct flattened ndarray
arnew = np.hstack(ar[r] for r in ar.dtype.names)

to flatten the recarray, then you can perform your normal ndarray operations, like

armin, armax = np.min(arnew), np.max(arnew)
print(armin),
print(armax)

the results are

237.0 238.05

basically ar.dtype.names gives you the list of recarray names, then you retrieve the array one by one from the names and stack to arnew

2 Comments

np.hstack() is useful if the different fields of the structured array do not have the same type, which is not the case here. For this question, the view() approach (see my answer) is way faster, and also has the advantage of keeping the 2D structure of the original array intact.
@EOL yep, I thought the op wanted a flattened ndarray so I suggested him use hstack(), but otherwise if the dtypes are uniform and only min/max are needed, sure, view is a lot lot better.
2

This may help someone else down the line, but another way to do it that may be more sensible:

import numpy as np
ar = np.array([(238.03, 238.0, 237.0),
              (238.02, 238.0, 237.01),
              (238.05, 238.01, 237.0)], 
              dtype=[('A', 'f'), ('B', 'f'), ('C', 'f')])
arView = ar.view(np.recarray)
arView.A.min()

which allowed me to just pick and choose. A problem on my end was that the dtype for all my elements were not the same (a rather complicated struct by and large).

Comments

0

A modern approach could leverage pandas to read and process the record array, then convert back to NumPy:

import pandas as pd

# read record array as a data frame, process data
df = pd.DataFrame(ar)
df_min = df.min(axis=0)

# convert to a uniform array
df_min.to_numpy()
# array([238.02, 238.  , 237.  ], dtype=float32)

# convert to a record array
df_min.to_frame().T.to_records(index=False)
# rec.array([(238.02, 238., 237.)],
#           dtype=[('A', '<f4'), ('B', '<f4'), ('C', '<f4')])

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.