Accessing chunks at once in a numpy array

Question

Provided a numpy array:

arr = np.array([0,1,2,3,4,5,6,7,8,9,10,11,12])

I wonder how access chosen size chunks with chosen separation, both concatenated and in slices:

E.g.: obtain chunks of size 3 separated by two values:

arr_chunk_3_sep_2 = np.array([0,1,2,5,6,7,10,11,12])
arr_chunk_3_sep_2_in_slices = np.array([[0,1,2],[5,6,7],[10,11,12])

Wha is the most efficient way to do it? If possible, I would like to avoid copying or creating new objects as much as possible. Maybe Memoryviews could be of help here?

For the first part, what's wrong with arr[[1, 3, 7, 11]]? For the second, what have you tried? [And why do you think it's possible to do in-place?] — jpp
– jpp, Commented Aug 9, 2018 at 13:03
However, this doesn't solve the issue with arbitrary size chunks. I don't see clearly how to slice pieces of an array wihtout providing one by one the indexes of each value. — ibarrond
– ibarrond, Commented Aug 9, 2018 at 13:12
I don't see how the second part is related to the first one. Why not make a separate question off the second one? — Divakar
– Divakar, Commented Aug 9, 2018 at 13:14
That arbitrary in arbitrary size chunks looks dubious. Don't you mean given chunk size? arbitrary size chunks could mean that the chunks in the output could have variable number of elements, which doesn't seem like the case from the posted expected output. — Divakar
– Divakar, Commented Aug 9, 2018 at 13:24

Divakar · Accepted Answer · 2018-08-09 13:59:07Z

3

Approach #1

Here's one with masking -

def slice_grps(a, chunk, sep):
    N = chunk + sep
    return a[np.arange(len(a))%N < chunk]

Sample run -

In [223]: arr
Out[223]: array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12])

In [224]: slice_grps(arr, chunk=3, sep=2)
Out[224]: array([ 0,  1,  2,  5,  6,  7, 10, 11, 12])

Approach #2

If the input array is such that the last chunk would have enough runway, we could , we could leverage np.lib.stride_tricks.as_strided, inspired by this post to select m elements off each block of n elements -

# https://stackoverflow.com/a/51640641/ @Divakar
def skipped_view(a, m, n):
    s = a.strides[0]
    strided = np.lib.stride_tricks.as_strided
    shp = ((a.size+n-1)//n,n)
    return strided(a,shape=shp,strides=(n*s,s), writeable=False)[:,:m]

out = skipped_view(arr,chunk,chunk+sep)

Note that the output would be a view into the input array and as such no extra memory overhead and virtually free!

Sample run to make things clear -

In [255]: arr
Out[255]: array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12])

In [256]: chunk = 3

In [257]: sep = 2

In [258]: skipped_view(arr,chunk,chunk+sep)
Out[258]: 
array([[ 0,  1,  2],
       [ 5,  6,  7],
       [10, 11, 12]])

# Let's prove that the output is a view indeed
In [259]: np.shares_memory(arr, skipped_view(arr,chunk,chunk+sep))
Out[259]: True

edited Aug 9, 2018 at 13:59

answered Aug 9, 2018 at 13:32

Divakar

222k19 gold badges273 silver badges374 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

ibarrond Over a year ago

Good! The use of Modulo means it can be speeded up in Cython using C-like division.

ibarrond Over a year ago

I still don´t see a clear solution for the sliced version though

ibarrond Over a year ago

I mean how to arrive to np.array([[0,1,2],[5,6,7],[10,11,12])

Divakar Over a year ago

@ibarrond That would be slice_grps(arr, chunk=3, sep=2).reshape(-1,chunk). I solved for the generic flattened case because there might be cases when the output size isn't a multiple of chunk.

ibarrond Over a year ago

I see! i was just missing the reshape. About the output size being a multiple of chunk, i would leave that to the actual implementation. If I am applying this to a vector of size 10.000, I can handpick the behaviour of the last chunk without severely impacting performance

|

hpaulj · Accepted Answer · 2018-08-09 15:57:18Z

How about a reshape and slice?

In [444]: arr = np.array([0,1,2,3,4,5,6,7,8,9,10,11,12])
In [445]: arr.reshape(-1,5)
...
ValueError: cannot reshape array of size 13 into shape (5)

Ah a problem - your array isn't big enough for this reshape - so we have to pad it:

In [446]: np.concatenate((arr,np.zeros(2,int))).reshape(-1,5)
Out[446]: 
array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12,  0,  0]])
In [447]: np.concatenate((arr,np.zeros(2,int))).reshape(-1,5)[:,:-2]
Out[447]: 
array([[ 0,  1,  2],
       [ 5,  6,  7],
       [10, 11, 12]])

as_strided can get a way with this by including bytes outside the databuffer. Usually that's seen as a bug, though here it can be an asset - provided you really do throw that garbage away.

Or throwing away the last incomplete line:

In [452]: arr[:-3].reshape(-1,5)[:,:3]
Out[452]: 
array([[0, 1, 2],
       [5, 6, 7]])

Collectives™ on Stack Overflow

Accessing chunks at once in a numpy array

2 Answers 2

7 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

7 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related