4

In brief

In Python 3.6 and using Numpy, what would be the most efficient way to rearrange the elements of a 2D array according to indices present in a different, similarly shaped, index 2D array?

Detailed

Suppose I have the following two 9 x 5 arrays, called A and B:

import numpy as np
A = np.array([[0.32, 0.35, 0.88, 0.63, 1.  ],
              [0.23, 0.69, 0.98, 0.22, 0.96],
              [0.7 , 0.51, 0.09, 0.58, 0.19],
              [0.98, 0.42, 0.62, 0.94, 0.46],
              [0.48, 0.59, 0.17, 0.23, 0.98]])

B = np.array([[4, 0, 3, 2, 1],
              [3, 2, 4, 1, 0],
              [4, 3, 0, 2, 1],
              [4, 2, 0, 3, 1],
              [0, 3, 1, 2, 4]])

I can successfully rearrange A using B as an index array by it by np.array(list(map(lambda i, j: j[i], B, A))):

array([[1.  , 0.32, 0.63, 0.88, 0.35],
       [0.22, 0.98, 0.96, 0.69, 0.23],
       [0.19, 0.58, 0.7 , 0.09, 0.51],
       [0.46, 0.62, 0.98, 0.94, 0.42],
       [0.48, 0.23, 0.59, 0.17, 0.98]])

However, when the dimensions of A and B increase, such a solution becomes really inefficient. If I am not mistaken, that is because:

  • using the lambda loops over all rows of A instead of relying on Numpy vectorizations
  • mapping is slow
  • converting list to array eats precious time.

Since in my real use case those arrays can grow quite big, and I have to reorder many of them in a long loop, a lot of my current performance bottleneck (measured with a profiler) comes from that single line of code above.

My question: what would the most efficient, more Numpy-smart way of achieving the above?

A toy code to test general arrays and time the process could be:

import numpy as np
nRows = 20000
nCols = 10000
A = np.round(np.random.uniform(0, 1, (nRows, nCols)), 2)
B = np.full((nRows, nCols), range(nCols))
for r in range(nRows):
    np.random.shuffle(B[r])
%time X = np.array(list(map(lambda i, j: j[i], B, A)))
11
  • 4
    np.take_along_axis(A,B,1)? Commented Jul 4, 2020 at 21:26
  • A[ np.arange(5)[:,None],B] should also work, but take_along is easier (if you remember it exists :) ). Commented Jul 4, 2020 at 21:40
  • @PaulPanzer I made some tests and the take_along_axis function is actually slower tha a FOR loop. Mystery... Commented Jul 4, 2020 at 21:47
  • Oops! Are your arrays rather small? What about @hpaulj's suggestion? Commented Jul 4, 2020 at 21:49
  • 1
    @PaulPanzer oh, it wasn't me (the OP) who commented before. My arrays can be rather big, significantly bigger than 20000 x 10000. I am playing with @bousof's suggestion, and it does seem that the loop becomes the most attractive for big nCols. take_along_axis and @hpaulj's are faster as nCols decreases Commented Jul 4, 2020 at 22:06

1 Answer 1

2

A comparison with three other possibilities:

import numpy as np
import time

# Input
nRows = 20000
nCols = 10000
A = np.round(np.random.uniform(0, 1, (nRows, nCols)), 2)
B = np.full((nRows, nCols), range(nCols))
for r in range(nRows):
  np.random.shuffle(B[r])

# Original
t_start = time.time()
X = np.array(list(map(lambda i, j: j[i], B, A)))
print('Timer 1:', time.time()-t_start, 's')

# FOR loop
t_start = time.time()
X = np.zeros((nRows, nCols))
for i in range(nRows):
  X[i] = A[i][B[i]]
print('Timer 2:', time.time()-t_start, 's')

# take_along_axis
t_start = time.time()
X = np.take_along_axis(A,B,1)
print('Timer 3:', time.time()-t_start, 's')

# Indexing
t_start = time.time()
X = A[ np.arange(nRows)[:,None],B]
print('Timer 4:', time.time()-t_start, 's')

Ouput:

% python3 script.py
Timer 1: 2.191567897796631 s
Timer 2: 1.3516249656677246 s
Timer 3: 1.675267219543457 s
Timer 4: 1.646852970123291 s

For low number of columns (nRows,nCols)=(200000,10) the results are completely different however:

% python3 script.py
Timer 1: 0.2729799747467041 s
Timer 2: 0.22678399085998535 s
Timer 3: 0.016162633895874023 s
Timer 4: 0.014748811721801758 s
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.