4

I have a big 1D array of data. I have a starts array of indexes into that data where important things happened. I want to get an array of ranges so that I get windows of length L, one for each starting point in starts. Bogus sample data:

data = np.linspace(0,10,50)
starts = np.array([0,10,21])
length = 5

I want to instinctively do something like

data[starts:starts+length]

But really, I need to turn starts into 2D array of range "windows." Coming from functional languages, I would think of it as a map from a list to a list of lists, like:

np.apply_along_axis(lambda i: np.arange(i,i+length), 0, starts)

But that won't work because apply_along_axis only allows scalar return values.

You can do this:

pairs = np.vstack([starts, starts + length]).T
ranges = np.apply_along_axis(lambda p: np.arange(*p), 1, pairs)
data[ranges]

Or you can do it with a list comprehension:

data[np.array([np.arange(i,i+length) for i in starts])]

Or you can do it iteratively. (Bleh.)

Is there a concise, idiomatic way to slice into an array at certain start points like this? (Pardon the numpy newbie-ness.)

1
  • Note: this is not the same problem as stackoverflow.com/questions/12589923/… - that question has irregular lengths. Here, what I'm doing should fit fine in a normal rectangular array. Commented Mar 13, 2015 at 19:48

2 Answers 2

4
data = np.linspace(0,10,50)
starts = np.array([0,10,21])
length = 5

For a NumPy only way of doing this, you can use numpy.meshgrid() as described here

http://docs.scipy.org/doc/numpy/reference/generated/numpy.meshgrid.html

As hpaulj pointed out in the comments, meshgrid actually isn't needed for this problem as you can use array broadcasting.

http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html

# indices = sum(np.meshgrid(np.arange(length), starts))

indices = np.arange(length) + starts[:, np.newaxis]
# array([[ 0,  1,  2,  3,  4],
#        [10, 11, 12, 13, 14],
#        [21, 22, 23, 24, 25]])
data[indices]

returns

array([[ 0.        ,  0.20408163,  0.40816327,  0.6122449 ,  0.81632653],
       [ 2.04081633,  2.24489796,  2.44897959,  2.65306122,  2.85714286],
       [ 4.28571429,  4.48979592,  4.69387755,  4.89795918,  5.10204082]])
Sign up to request clarification or add additional context in comments.

3 Comments

nice one, + 1 from me--as far as i know, most concise & elegant way to do this.
indices=np.arange(length)+starts[:,None] also works
@hpaulj Doh! That's much smoother. Updated answer.
2

If you need to do this a lot of time, you can use as_strided() to create a sliding windows array of data

data = np.linspace(0,10,50000)
length = 5
starts = np.random.randint(0, len(data)-length, 10000)

from numpy.lib.stride_tricks import as_strided
sliding_window = as_strided(data, (len(data) - length + 1, length), 
                 (data.itemsize, data.itemsize))

Then you can use:

sliding_window[starts]

to get what you want.

It's also faster than creating the index array.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.