Sorting values in numpy structured array based on field name value

Question

I have the following structured array:

import numpy as np

x = np.rec.array([(22,2,200.,2000.), (44,2,400.,4000.), (55,5,500.,5000.), (33,3,400.,3000.)],
              dtype={'names':['subcase','id', 'vonmises','maxprincipal'], 'formats':['i4','i4','f4','f4']})

I am trying to get the max vonmises for each id.

For example the max vonmises for id 2 would be 400. And i do want the corresponding subcase, and maxprincipal.

Here is what i have done so far:

print repr(x[['subcase','id','vonmises']][(x['id']==2) & (x['vonmises']==max(x['vonmises'][x['id']==2]))])

Here is the output:

array([(44, 2, 400.0)], 
  dtype=(numpy.record, [('subcase', '<i4'), ('id', '<i4'), ('vonmises', '<f4')]))

The issue i am having now is that i want this to work for all ids that are in the array, not just id=2.

i.e. want to get the following output:

array([(44, 2, 400.0),(55, 5, 500.0),(33, 3, 400.0)], 
  dtype=(numpy.record, [('subcase', '<i4'), ('id', '<i4'), ('vonmises', '<f4')]))

Is there a nice way to accomplish this without specifying each individual id?

hpaulj · Accepted Answer · 2016-04-30 16:26:52Z

Here's an approach using np.sort (or argsort) followed by itertools.groupby. But this grouping tools produces a generator of generators, which is messier to work with.

In [29]: x = np.rec.array([(22,2,200.,2000.), (44,2,400.,4000.), (55,5,500.,5000.), (33,3,400.,3000.)],
              dtype={'names':['subcase','id', 'vonmises','maxprincipal'], 'formats':['i4','i4','f4','f4']})

In [30]: ind=x.argsort(order=['id','vonmises'])

In [31]: ind
Out[31]: 
rec.array([0, 1, 3, 2], 
          dtype=int32)

In [32]: x[ind]
Out[32]: 
rec.array([(22, 2, 200.0, 2000.0), (44, 2, 400.0, 4000.0), (33, 3, 400.0, 3000.0),
 (55, 5, 500.0, 5000.0)], 
          dtype=[('subcase', '<i4'), ('id', '<i4'), ('vonmises', '<f4'), ('maxprincipal', '<f4')])

In [33]: import itertools

In [34]: [list(v) for k,v in itertools.groupby(x[ind],lambda i:i['id'])]
Out[34]: 
[[(22, 2, 200.0, 2000.0), (44, 2, 400.0, 4000.0)],
 [(33, 3, 400.0, 3000.0)],
 [(55, 5, 500.0, 5000.0)]]

Then we have to fetch the last (or first for min) record of each group, and then reconstitute the recarray.

In [39]: mx=[list(v)[-1] for k,v in itertools.groupby(x[ind],lambda i:i['id'])]

In [43]: np.rec.fromrecords(mx,dtype=x.dtype)
Out[43]: 
rec.array([(44, 2, 400.0, 4000.0), (33, 3, 400.0, 3000.0), (55, 5, 500.0, 5000.0)], 
          dtype=[('subcase', '<i4'), ('id', '<i4'), ('vonmises', '<f4'), ('maxprincipal', '<f4')])

Elements of mx are np.record with the correct dtype, but mx itself is a list.

Or compactly:

g=itertools.groupby(np.sort(x,order=['id','vonmises']), lambda i:i['id'])
np.rec.fromrecords([list(v)[-1] for k,v in g], dtype=x.dtype)

Colonel Beauvel · Accepted Answer · 2016-04-30 15:42:24Z

2

I do not know why you use this format but here is a hack with pandas:

import pandas as pd

df  = pd.DataFrame(x)
df_ = df.groupby('id')['vonmises'].max().reset_index()

In [213]: df_.merge(df, on=['id','vonmises'])[['id','vonmises','subcase']]

Out[213]:
array([[   2.,  400.,   44.],
       [   3.,  400.,   33.],
       [   5.,  500.,   55.]], dtype=float32)

answered Apr 30, 2016 at 15:42

Colonel Beauvel

31.3k11 gold badges49 silver badges88 bronze badges

5 Comments

snowleopard Over a year ago

Thank you very much, can you please elaborate on the format? What format would you recommend?

Colonel Beauvel Over a year ago

I dunno what is your underlying problem you are trying to solve, but for grouping operations, typically, taking the max per group and filtering on the max of the groups, I would recommend to go for pandas DataFrame representation, like df and df_ instead of these rec array.

hpaulj Over a year ago

np.sort lets you specify sorting order based on fields, e.g. ['id','vonmises']. But then you have you to use something like intertools.groupby to extract the first or last of each group. Possible, but messier than in pandas.

Colonel Beauvel Over a year ago

may I know the reason of a downvote for this alternative solution?

snowleopard Over a year ago

I upvoted you, but maybe the reason there was a downvote was because you proposed pandas when my question was regarding numpy. But I think it is a nice proposal. It's always good to know what's out there.

Eric · Accepted Answer · 2016-04-30 17:26:35Z

2

Here's an approach without groupby:

# sort as desired
x.sort(order=['id','vonmises'])

# keep the first element, and every element with a different id to the one before it
keep = np.empty(x.shape, dtype=np.bool)
keep[0] = True
keep[1:] = x[:-1].id != x[1:].id

x_filt = x[keep]

Which gives:

rec.array([(22, 2, 200.0, 2000.0), (33, 3, 400.0, 3000.0), (55, 5, 500.0, 5000.0)], 
      dtype=[('subcase', '<i4'), ('id', '<i4'), ('vonmises', '<f4'), ('maxprincipal', '<f4')])

answered Apr 30, 2016 at 17:26

Eric

98.1k54 gold badges257 silver badges389 bronze badges

Comments

Eelco Hoogendoorn · Accepted Answer · 2016-04-30 17:50:59Z

2

Using the numpy_indexed package, this would be a simple one-liner:

import numpy_indexed as npi
ids, maxvonmises = npi.group_by(x.id).max(x.vonmises)

Probably similar performance to pandas, but a lot more readable, and no need to adapt your dataformat to the problem at hand.

answered Apr 30, 2016 at 17:50

Eelco Hoogendoorn

10.8k1 gold badge46 silver badges43 bronze badges

Collectives™ on Stack Overflow

Sorting values in numpy structured array based on field name value

4 Answers 4

Comments

5 Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

5 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related