2

I need to sort a multidimensional array according to the values in the first sub-array, as fast as possible (the line is applied millions of times).

Below is my original line, and my attempt at improving its performance which is not working. As far as I can see, my numpy approach is only sorting properly the first sub-array, and none of the remaining ones.

What am I doing wrong and how can I improve the performance of the sorting?

import numpy as np

# Generate some random data.
# I receive the actual data as a list, hence the .tolist()
aa = np.random.rand(10, 2000).tolist()

# This is the original line I need to process faster.
b1 = zip(*sorted(zip(*aa), key=lambda x: x[0]))

# This is my attempt at improving the above line's performance
b2 = np.sort(np.asarray(aa).T, axis=0).T

# Check if all sub-arrays are equal
for a, b in zip(*[b1, b2]):
    print(np.array_equal(a, b))
2
  • 1
    Right off the bat, you can try replacing lambda x: x[0] with operator.itemgetter(0). Commented Aug 30, 2017 at 13:03
  • Thanks, I'll try that now. But why is the numpy approach not working? What am I doing wrong? Commented Aug 30, 2017 at 13:04

1 Answer 1

4

Still a novice when it comes to lambdas, but from what little I understand from your code - It seems in your lambda method, you are using x[0] to get the sort keys and then using those to pull values off each element in aa. In NumPy terms, that translates to getting the sort indices for the first row in the array version and then indexing into each row (since each element of aa becomes each row of array a). That's basically column-indexing. Also, it seems sorted maintains order for identical elements. So, we need to use argsort(kind='mergesort').

Thus, we can simply do -

a[:, a[0].argsort(kind='mergesort')] # a = np.array(aa) 

In your NumPy code, you are doing nothing of those sorts, so not giving the correct results.

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks! This is ~20X faster than my original approach. Could you explain what am I doing wrong with my use of numpy and transposing? That way I can learn from my mistakes :)
Thank you very much for the explanation Divakar!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.