1

Provided a numpy array:

arr = np.array([0,1,2,3,4,5,6,7,8,9,10,11,12])

I wonder how access chosen size chunks with chosen separation, both concatenated and in slices:

E.g.: obtain chunks of size 3 separated by two values:

arr_chunk_3_sep_2 = np.array([0,1,2,5,6,7,10,11,12])
arr_chunk_3_sep_2_in_slices = np.array([[0,1,2],[5,6,7],[10,11,12])

Wha is the most efficient way to do it? If possible, I would like to avoid copying or creating new objects as much as possible. Maybe Memoryviews could be of help here?

6
  • 1
    For the first part, what's wrong with arr[[1, 3, 7, 11]]? For the second, what have you tried? [And why do you think it's possible to do in-place?] Commented Aug 9, 2018 at 13:03
  • 1
    stackoverflow.com/questions/24426452/… Commented Aug 9, 2018 at 13:04
  • However, this doesn't solve the issue with arbitrary size chunks. I don't see clearly how to slice pieces of an array wihtout providing one by one the indexes of each value. Commented Aug 9, 2018 at 13:12
  • I don't see how the second part is related to the first one. Why not make a separate question off the second one? Commented Aug 9, 2018 at 13:14
  • 1
    That arbitrary in arbitrary size chunks looks dubious. Don't you mean given chunk size? arbitrary size chunks could mean that the chunks in the output could have variable number of elements, which doesn't seem like the case from the posted expected output. Commented Aug 9, 2018 at 13:24

2 Answers 2

3

Approach #1

Here's one with masking -

def slice_grps(a, chunk, sep):
    N = chunk + sep
    return a[np.arange(len(a))%N < chunk]

Sample run -

In [223]: arr
Out[223]: array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12])

In [224]: slice_grps(arr, chunk=3, sep=2)
Out[224]: array([ 0,  1,  2,  5,  6,  7, 10, 11, 12])

Approach #2

If the input array is such that the last chunk would have enough runway, we could , we could leverage np.lib.stride_tricks.as_strided, inspired by this post to select m elements off each block of n elements -

# https://stackoverflow.com/a/51640641/ @Divakar
def skipped_view(a, m, n):
    s = a.strides[0]
    strided = np.lib.stride_tricks.as_strided
    shp = ((a.size+n-1)//n,n)
    return strided(a,shape=shp,strides=(n*s,s), writeable=False)[:,:m]

out = skipped_view(arr,chunk,chunk+sep)

Note that the output would be a view into the input array and as such no extra memory overhead and virtually free!

Sample run to make things clear -

In [255]: arr
Out[255]: array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12])

In [256]: chunk = 3

In [257]: sep = 2

In [258]: skipped_view(arr,chunk,chunk+sep)
Out[258]: 
array([[ 0,  1,  2],
       [ 5,  6,  7],
       [10, 11, 12]])

# Let's prove that the output is a view indeed
In [259]: np.shares_memory(arr, skipped_view(arr,chunk,chunk+sep))
Out[259]: True
Sign up to request clarification or add additional context in comments.

7 Comments

Good! The use of Modulo means it can be speeded up in Cython using C-like division.
I still don´t see a clear solution for the sliced version though
I mean how to arrive to np.array([[0,1,2],[5,6,7],[10,11,12])
@ibarrond That would be slice_grps(arr, chunk=3, sep=2).reshape(-1,chunk). I solved for the generic flattened case because there might be cases when the output size isn't a multiple of chunk.
I see! i was just missing the reshape. About the output size being a multiple of chunk, i would leave that to the actual implementation. If I am applying this to a vector of size 10.000, I can handpick the behaviour of the last chunk without severely impacting performance
|
2

How about a reshape and slice?

In [444]: arr = np.array([0,1,2,3,4,5,6,7,8,9,10,11,12])
In [445]: arr.reshape(-1,5)
...
ValueError: cannot reshape array of size 13 into shape (5)

Ah a problem - your array isn't big enough for this reshape - so we have to pad it:

In [446]: np.concatenate((arr,np.zeros(2,int))).reshape(-1,5)
Out[446]: 
array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12,  0,  0]])
In [447]: np.concatenate((arr,np.zeros(2,int))).reshape(-1,5)[:,:-2]
Out[447]: 
array([[ 0,  1,  2],
       [ 5,  6,  7],
       [10, 11, 12]])

as_strided can get a way with this by including bytes outside the databuffer. Usually that's seen as a bug, though here it can be an asset - provided you really do throw that garbage away.

Or throwing away the last incomplete line:

In [452]: arr[:-3].reshape(-1,5)[:,:3]
Out[452]: 
array([[0, 1, 2],
       [5, 6, 7]])

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.