NumPy: get min/max from record array of numeric values

Question

I have a NumPy record array of floats:

import numpy as np
ar = np.array([(238.03, 238.0, 237.0),
               (238.02, 238.0, 237.01),
               (238.05, 238.01, 237.0)], 
              dtype=[('A', 'f'), ('B', 'f'), ('C', 'f')])

How can I determine min/max from this record array? My usual attempt of ar.min() fails with:

TypeError: cannot perform reduce with flexible type

I'm not sure how to flatten the values out into a simpler NumPy array.

Community · Accepted Answer · 2017-05-23 11:45:13Z

5

The easiest and most efficient way is probably to view your array as a simple 2D array of floats:

ar_view = ar.view((ar.dtype[0], len(ar.dtype.names)))

which is a 2D array view on the structured array:

print ar_view.min(axis=0)  # Or whatever…

This method is fast, as no new array is created (changes to ar_view result in changes to ar). It is restricted to cases like yours, though, where all record fields have the same type (float32, here).

One advantage is that this method keeps the 2D structure of the original array intact: you can find the minimum in each "column" (axis=0), for instance.

edited May 23, 2017 at 11:45

CommunityBot

11 silver badge

answered Jul 4, 2012 at 5:58

Eric O. Lebigot

95.1k49 gold badges223 silver badges263 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Mike T Over a year ago

I get an Error with float: "ValueError: new type not compatible with array." However, if I use a NumPy float data type like ar.dtype[0] (or dtype('float32')), success!

Mike T Over a year ago

ar.view((ar.dtype[0], len(ar.dtype)))

djvg Over a year ago

I guess now we would use structured_to_unstructured?

Eric O. Lebigot Over a year ago

This is an interesting comment. Note that structured_to_unstructured creates a new array and is therefore not fully equivalent to this answer (and is slower).

nye17 · Accepted Answer · 2012-07-04 01:35:03Z

3

you can do

# construct flattened ndarray
arnew = np.hstack(ar[r] for r in ar.dtype.names)

to flatten the recarray, then you can perform your normal ndarray operations, like

armin, armax = np.min(arnew), np.max(arnew)
print(armin),
print(armax)

the results are

237.0 238.05

basically ar.dtype.names gives you the list of recarray names, then you retrieve the array one by one from the names and stack to arnew

edited Jul 4, 2012 at 1:35

answered Jul 4, 2012 at 1:26

nye17

13.5k11 gold badges62 silver badges69 bronze badges

2 Comments

Eric O. Lebigot Over a year ago

np.hstack() is useful if the different fields of the structured array do not have the same type, which is not the case here. For this question, the view() approach (see my answer) is way faster, and also has the advantage of keeping the 2D structure of the original array intact.

nye17 Over a year ago

@EOL yep, I thought the op wanted a flattened ndarray so I suggested him use hstack(), but otherwise if the dtypes are uniform and only min/max are needed, sure, view is a lot lot better.

kratsg · Accepted Answer · 2015-06-02 03:22:40Z

2

This may help someone else down the line, but another way to do it that may be more sensible:

import numpy as np
ar = np.array([(238.03, 238.0, 237.0),
              (238.02, 238.0, 237.01),
              (238.05, 238.01, 237.0)], 
              dtype=[('A', 'f'), ('B', 'f'), ('C', 'f')])
arView = ar.view(np.recarray)
arView.A.min()

which allowed me to just pick and choose. A problem on my end was that the dtype for all my elements were not the same (a rather complicated struct by and large).

answered Jun 2, 2015 at 3:22

kratsg

6211 gold badge6 silver badges17 bronze badges

Comments

Mike T · Accepted Answer · 2022-04-17 10:27:47Z

0

A modern approach could leverage pandas to read and process the record array, then convert back to NumPy:

import pandas as pd

# read record array as a data frame, process data
df = pd.DataFrame(ar)
df_min = df.min(axis=0)

# convert to a uniform array
df_min.to_numpy()
# array([238.02, 238.  , 237.  ], dtype=float32)

# convert to a record array
df_min.to_frame().T.to_records(index=False)
# rec.array([(238.02, 238., 237.)],
#           dtype=[('A', '<f4'), ('B', '<f4'), ('C', '<f4')])

answered Apr 17, 2022 at 10:27

Mike T

44.3k18 gold badges166 silver badges213 bronze badges

Collectives™ on Stack Overflow

NumPy: get min/max from record array of numeric values

4 Answers 4

4 Comments

2 Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

4 Comments

2 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related