How to efficiently index into a 1D numpy array via slice ranges

Question

I have a big 1D array of data. I have a starts array of indexes into that data where important things happened. I want to get an array of ranges so that I get windows of length L, one for each starting point in starts. Bogus sample data:

data = np.linspace(0,10,50)
starts = np.array([0,10,21])
length = 5

I want to instinctively do something like

data[starts:starts+length]

But really, I need to turn starts into 2D array of range "windows." Coming from functional languages, I would think of it as a map from a list to a list of lists, like:

np.apply_along_axis(lambda i: np.arange(i,i+length), 0, starts)

But that won't work because apply_along_axis only allows scalar return values.

You can do this:

pairs = np.vstack([starts, starts + length]).T
ranges = np.apply_along_axis(lambda p: np.arange(*p), 1, pairs)
data[ranges]

Or you can do it with a list comprehension:

data[np.array([np.arange(i,i+length) for i in starts])]

Or you can do it iteratively. (Bleh.)

Is there a concise, idiomatic way to slice into an array at certain start points like this? (Pardon the numpy newbie-ness.)

Note: this is not the same problem as stackoverflow.com/questions/12589923/… - that question has irregular lengths. Here, what I'm doing should fit fine in a normal rectangular array. — Dan Fitch
– Dan Fitch, Commented Mar 13, 2015 at 19:48

Alex · Accepted Answer · 2015-03-13 21:13:53Z

4

data = np.linspace(0,10,50)
starts = np.array([0,10,21])
length = 5

For a NumPy only way of doing this, you can use numpy.meshgrid() as described here

http://docs.scipy.org/doc/numpy/reference/generated/numpy.meshgrid.html

As hpaulj pointed out in the comments, meshgrid actually isn't needed for this problem as you can use array broadcasting.

http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html

# indices = sum(np.meshgrid(np.arange(length), starts))

indices = np.arange(length) + starts[:, np.newaxis]
# array([[ 0,  1,  2,  3,  4],
#        [10, 11, 12, 13, 14],
#        [21, 22, 23, 24, 25]])
data[indices]

returns

array([[ 0.        ,  0.20408163,  0.40816327,  0.6122449 ,  0.81632653],
       [ 2.04081633,  2.24489796,  2.44897959,  2.65306122,  2.85714286],
       [ 4.28571429,  4.48979592,  4.69387755,  4.89795918,  5.10204082]])

edited Mar 13, 2015 at 21:13

answered Mar 13, 2015 at 19:33

Alex

19.2k9 gold badges65 silver badges82 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

doug Over a year ago

nice one, + 1 from me--as far as i know, most concise & elegant way to do this.

hpaulj Over a year ago

indices=np.arange(length)+starts[:,None] also works

Alex Over a year ago

@hpaulj Doh! That's much smoother. Updated answer.

HYRY · Accepted Answer · 2015-03-14 11:47:07Z

2

If you need to do this a lot of time, you can use as_strided() to create a sliding windows array of data

data = np.linspace(0,10,50000)
length = 5
starts = np.random.randint(0, len(data)-length, 10000)

from numpy.lib.stride_tricks import as_strided
sliding_window = as_strided(data, (len(data) - length + 1, length), 
                 (data.itemsize, data.itemsize))

Then you can use:

sliding_window[starts]

to get what you want.

It's also faster than creating the index array.

answered Mar 14, 2015 at 11:47

HYRY

97.8k28 gold badges197 silver badges192 bronze badges

Collectives™ on Stack Overflow

How to efficiently index into a 1D numpy array via slice ranges

2 Answers 2

3 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related