3

I have a time series with 4 features at each step, it looks like a set of rows with 4 columns. I want convert it, so row N will contain a vector of features of rows N and N-1

a = np.array([[1,2,3,0], [4,5,6,0], [7,8,9,0], [10,11,12,0]])
array([[ 1,  2,  3,  0],
       [ 4,  5,  6,  0],
       [ 7,  8,  9,  0],
       [10, 11, 12,  0]])

a.shape
(4, 4)

convert to:

array([[[ 1,  2,  3,  0],
        [ 4,  5,  6,  0]],

       [[ 4,  5,  6,  0],
        [ 7,  8,  9,  0]],

       [[ 7,  8,  9,  0],
        [10, 11, 12,  0]]])
a_.shape
(3, 2, 4)

I'm using the following code to do that:

seq_len = 2
for i in range(seq_len, a.shape[0]+1):
    if i-seq_len == 0:
        a_ = a[i-seq_len:i, :].reshape(1, -1, 4)
    else:
        a_ = np.vstack([a_, a[i-seq_len:i, :].reshape(1, -1, 4)])

It's working but I think it is not an optimal solution. Could you please suggest how I can improve my code by avoiding 'for' cycle?

2 Answers 2

8

Use adequate slicing and np.stack along the adequate axis.

np.stack((a[:-1], a[1:]), axis=1)

Some timings to compare with the other answer out there.

In [13]: s = 1_000_000

In [15]: a = np.arange(s).reshape((s//4,4))

In [21]: %timeit a[[(i-1,i) for i in range(1,a.shape[0])],:]
127 ms ± 724 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [22]: %timeit np.stack((a[:-1], a[1:]), axis=1)  # My solution
6.8 ms ± 8.18 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Avoiding any python-level for-loop is the way to go, OP was right.

Sign up to request clarification or add additional context in comments.

Comments

0

Use slicing: a[[(i-1,i) for i in range(1,a.shape[0])],:]

Edit: nicoco's answer is the better one.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.