9

it looks like sorting numpy structured and record arrays by a single column is much slower than doing a sort on a similar standalone array:

In [111]: a = np.random.rand(1e4)

In [112]: b = np.random.rand(1e4)

In [113]: rec = np.rec.fromarrays([a,b])

In [114]: timeit rec.argsort(order='f0')
100 loops, best of 3: 18.8 ms per loop

In [115]: timeit a.argsort()
1000 loops, best of 3: 891 µs per loop

There is a marginal improvement using the structured array, but it's not dramatic:

In [120]: struct = np.empty(len(a),dtype=[('a','f8'),('b','f8')])

In [121]: struct['a'] = a

In [122]: struct['b'] = b

In [124]: timeit struct.argsort(order='a')
100 loops, best of 3: 15.8 ms per loop

This indicates that it's potentially faster to create an index array from argsort and then use that to reorder the individual arrays. This is OK except that I expect to be dealing with very large arrays and would like to avoid copying data as much as possible. Is there a more efficient way of doing this that I'm missing?

2 Answers 2

4

What´s slowing you is the use of order, not the fact that you have a record array. If you want to sort by a single field, do it like this:

In [12]: %timeit np.argsort(rec['f0'])
1000 loops, best of 3: 829 us per loop

Once order is used, performance goes south no matter how many fields you want to sort by:

In [16]: %timeit np.argsort(rec, order=['f0'])
10 loops, best of 3: 27.9 ms per loop

In [17]: %timeit np.argsort(rec, order=['f0', 'f1'])
10 loops, best of 3: 28.4 ms per loop
Sign up to request clarification or add additional context in comments.

2 Comments

Aha! I figured order did the np.argsort() under the hood, but I guess not?
but actually, this doesn't solve the issue of copying the data -- it will require me to pass the indices returned by argsort which will result in a copy.
3

As Jaime have said, you can use argsort to sort the record array.

inds = np.argsort(rec['f0'])

And use take to avoid making a copy

np.take(rec, inds, out=rec)

1 Comment

The only reason that works is because np.take makes a copy when you specify an out parameter and leave the mode one in its default 'raise' state, you can look at the source. If you use another mode there will be no copy, but the output will be rubbish, with some values repeated several times and others missing altogether.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.