3

I have a 3D array of data. I have a 2D array of indices, where the shape matches the first two dimensions of the data array, and it specfies the indices I want to pluck from the data array to make a 2D array. eg:

 from numpy import *
 a = arange(3 * 5 * 7).reshape((3,5,7))
 getters = array([0,1,2] * (5)).reshape(3,5)

What I'm looking for is a syntax like a[:, :, getters] which returns an array of shape (3,5) by indexing independently into the third dimension of each item. However, a[:, :, getters] returns an array of shape (3,5,3,5). I can do it by iterating and building a new array, but this is pretty slow:

 array([[col[getters[ri,ci]] for ci,col in enumerate(row)] for ri,row in enumerate(a)])
 # gives array([[  0,   8,  16,  21,  29],
 #    [ 37,  42,  50,  58,  63],
 #    [ 71,  79,  84,  92, 100]])

Is there a neat+fast way?

1 Answer 1

3

If I understand you correctly, I've done something like this using fancy indexing:

>>> k,j = np.meshgrid(np.arange(a.shape[1]),np.arange(a.shape[0]))
>>> k
array([[0, 1, 2, 3, 4],
       [0, 1, 2, 3, 4],
       [0, 1, 2, 3, 4]])
>>> j
array([[0, 0, 0, 0, 0],
       [1, 1, 1, 1, 1],
       [2, 2, 2, 2, 2]])
>>> a[j,k,getters]
array([[  0,   8,  16,  21,  29],
       [ 37,  42,  50,  58,  63],
       [ 71,  79,  84,  92, 100]])

Of course, you can keep k and j around and use them as often as you'd like. As pointed out by DSM in comments below, j,k = np.indices(a.shape[:2]) should also work instead of meshgrid. Which one is faster (apparently) depends on the number of elements you are using.

Sign up to request clarification or add additional context in comments.

5 Comments

Minor tweak: j,k = np.indices(a.shape[:2]).
indices is a very nice function, but to a short test it is 2 or 3 times less performant than meshgrid
@EnricoGiampieri: true enough, although np.indices wins for larger arrays; I get crossover at about 10^6 elements.
@DMS I guess it depends on the machine, for 10^6 elements ( shape (50,50,50) ) I obtain that np.indices takes twice the time. it's hard to say from the source code which one should perform better under which conditions...got any insight?
@DSM -- Thanks for that. I had actually tried to solve this problem a while back and the solution I posted above was the best I could come up with. As with many things numpy, it's neat to see that there are different ways of doing it.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.