Fast sort multidimensional array

Question

I need to sort a multidimensional array according to the values in the first sub-array, as fast as possible (the line is applied millions of times).

Below is my original line, and my attempt at improving its performance which is not working. As far as I can see, my numpy approach is only sorting properly the first sub-array, and none of the remaining ones.

What am I doing wrong and how can I improve the performance of the sorting?

import numpy as np

# Generate some random data.
# I receive the actual data as a list, hence the .tolist()
aa = np.random.rand(10, 2000).tolist()

# This is the original line I need to process faster.
b1 = zip(*sorted(zip(*aa), key=lambda x: x[0]))

# This is my attempt at improving the above line's performance
b2 = np.sort(np.asarray(aa).T, axis=0).T

# Check if all sub-arrays are equal
for a, b in zip(*[b1, b2]):
    print(np.array_equal(a, b))

Right off the bat, you can try replacing lambda x: x[0] with operator.itemgetter(0). — chepner
– chepner, Commented Aug 30, 2017 at 13:03
Thanks, I'll try that now. But why is the numpy approach not working? What am I doing wrong? — Gabriel
– Gabriel, Commented Aug 30, 2017 at 13:04

Divakar · Accepted Answer · 2017-08-30 17:49:26Z

4

Still a novice when it comes to lambdas, but from what little I understand from your code - It seems in your lambda method, you are using x[0] to get the sort keys and then using those to pull values off each element in aa. In NumPy terms, that translates to getting the sort indices for the first row in the array version and then indexing into each row (since each element of aa becomes each row of array a). That's basically column-indexing. Also, it seems sorted maintains order for identical elements. So, we need to use argsort(kind='mergesort').

Thus, we can simply do -

a[:, a[0].argsort(kind='mergesort')] # a = np.array(aa)

In your NumPy code, you are doing nothing of those sorts, so not giving the correct results.

edited Aug 30, 2017 at 17:49

answered Aug 30, 2017 at 13:07

Divakar

222k19 gold badges273 silver badges374 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Gabriel Over a year ago

Thanks! This is ~20X faster than my original approach. Could you explain what am I doing wrong with my use of numpy and transposing? That way I can learn from my mistakes :)

Gabriel Over a year ago

Thank you very much for the explanation Divakar!

Collectives™ on Stack Overflow

Fast sort multidimensional array

1 Answer 1

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related