2

I have the following numpy array:

array=[1,1,1,1,2,2,3,3,3,5,6,6,6,6,6,6,7]

I need to break this array into smaller arrays of same values such as

[1,1,1,1] and [3,3,3]

My code for this is as follows but it doesn't work:

def chunker(seq, size):
    return (seq[pos:pos + size] for pos in range(0, len(seq)-size))
counter=0
sub_arr=[]
arr=[]
for i in range(len(array)):
    if(array[i]==array[i+1]):
        counter+=1
    else:
        break
        subarr=chunker(array,counter)
    arr.append(sub_arr)
    array=array[counter:]

what is an efficient to break down the array into smaller arrays of equal/same values?

1
  • Do you expect/care about arrays like [1,1,2,2,1,1]? Do you care about the order of the subarrays in the list? (Should it match the original order?) Commented Jul 10, 2018 at 7:37

3 Answers 3

3

A numpy solution for floats and integers:

import numpy as np
a = np.asarray([1,1,1,1,2,2,3,3,3,5,6,6,6,6,6,6,7])
#calculate differences between neighbouring elements and get index where element changes
#sample output for index would be [ 4  6  9 10 16]
index = np.where(np.diff(a) != 0)[0] + 1
#separate arrays
print(np.split(a, index))

Sample output:

[array([1, 1, 1, 1]), array([2, 2]), array([3, 3, 3]), array([5]), array([6, 6, 6, 6, 6, 6]), array([7])]

If you had strings, this method naturally wouldn't work. Then you should go with DyZ's itertools approach.

Sign up to request clarification or add additional context in comments.

Comments

2

NumPy has poor support for such grouping. I suggest using itertools that operate on lists.

from itertools import groupby
[np.array(list(data)) for _,data in itertools.groupby(array)]
#[array([1, 1, 1, 1]), array([2, 2]), array([3, 3, 3]), \
# array([5]), array([6, 6, 6, 6, 6, 6]), array([7])]

This is not necessarily the most efficient method, because it involves converstions to and from lists.

Comments

0

Here's an approach using Pandas:

import pandas as pd 

(pd.Series(array)
   .value_counts()
   .reset_index()
   .apply(lambda x: [x["index"]] * x[0], axis=1))  

Explanation:
First, convert array to a Series, and use value_counts() to get a count of each unique entry:

counts = pd.Series(array).value_counts().reset_index()
   index  0
0      6  6
1      1  4
2      3  3
3      2  2
4      7  1
5      5  1

Then recreate each repeated-element list, using apply():

counts.apply(lambda x: [x["index"]] * x[0], axis=1)

0    [6, 6, 6, 6, 6, 6]
1          [1, 1, 1, 1]
2             [3, 3, 3]
3                [2, 2]
4                   [7]
5                   [5]
dtype: object

You can use the .values property to convert from a Series of lists to a list of lists, if needed.

4 Comments

This works only if a range of values is not repeated (as in [1,1,2,2,1,1]). And it does not preserve the order.
@DyZ not sure I'm tracking - what do you mean it only works if a range of values isn't repeated? OP asked for "smaller arrays of same values". My approach correctly groups a list of 1s and a list of 2s in your example. OP also doesn't ask for order preserved as far as I can tell. Can you clarify?
When I apply your method to my example, I get two ranges: [1,1,1,1] and [2,2] - instead of three ranges [1,1], [2,2], and [1,1]. I am not sure how important it is for the OP, just making the point.
Ah ok - I interpreted OP's request as needing two ranges, not three, in your example. Thanks for clarifying!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.