5

I want to detect consecutive spans of 1's in a numpy array. Indeed, I want to first identify whether the element in an array is in a span of a least three 1's. For example, we have the following array a:

    import numpy as np
    a = np.array([1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0])

Then the following 1's in bold are the elements satisfy the requirement.

[1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0]

Next, if two spans of 1's are separated by at most two 0's, then the two spans make up a longer span. So the above array is charaterized as

[1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0]

In other words, for the original array as input, I want the output as follows:

    [True, True, True, True, True, True, True, False, False, False, False, False, True, True, True, True, True, True, True, True, True, True, False]

I have been thinking of an algorithm to implement this function, but all the one I come up with seems to complicated. So I would love to know better ways to implement this -- it would be greatly appreciated if someone can help me out.

Update:

I apologize that I did not make my question clear. I want to identify 3 or more consecutive 1's in the array as a span of 1's, and any two spans of 1's with only one or two 0's in between are identified, along with the separating 0's, as a single long span. My goal can be understood in the following way: if there are only one or two 0's between spans of 1's, I consider those 0's as errors and are supposed to be corrected as 1's.

@ritesht93 provided an answer that almost gives what I want. However, the current answer does not identify the case when there are three spans of 1's that are separated by 0's, which should be identified as one single span. For example, for the array

    a2 = np.array([0, 1, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 1, 1, 0])

we should receive the output

    [False,  True,  True,  True,  True,  True,  True,  True,  True,
   True,  True,  True,  True,  True, False, False,  False, False,
   False,  True,  True,  True,  True,  True, False]

Update 2:

I was greatly inspired by and found the algorithm based on regular expression is easiest to implement and to understand -- though I am not sure about the efficient compared to other methods. Eventually I used the following method.

    lst = np.array([0, 1, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 1, 1, 0])
    lst1 = re.sub(r'1{3,}', lambda x:'c'*len(x.group()), ''.join(map(str, lst)))
    print lst1

which identified spans of 1's

    0ccc0ccc00cccc00100ccccc0

and then connect spans of 1's

    lst2 = re.sub(r'c{1}0{1,2}c{1}', lambda x:'c'*len(x.group()), ''.join(map(str, lst1)))
    print lst2

which gives

    0ccccccccccccc00100ccccc0

The final result is given by

    np.array(list(lst2)) == 'c'

    array([False,  True,  True,  True,  True,  True,  True,  True,  True,
    True,  True,  True,  True,  True, False, False, False, False,
   False,  True,  True,  True,  True,  True, False])
9
  • "Array characterized as:" and "want the output as follows" contradicts each other: Should positions 8 and 9 be part of a "span" ("True") or not (not bold)? Commented Oct 31, 2016 at 9:14
  • @hvwaldow Yes, you are correct. Thanks for pointing that out. Arealdy corrected. Commented Oct 31, 2016 at 9:39
  • "However, the current answer does not identify the case when there are three spans of 1's that are seperated by 0's,...." Hmm. I don't see this. My answer seems to produce the correct solution for your second testcase. Commented Oct 31, 2016 at 11:38
  • @hvwaldow Please refer to the updated description of the question. Sorry for not describing the question clearly and for the mistakes on the desired output... Commented Oct 31, 2016 at 11:44
  • I did. Validates against 2nd testcase. Commented Oct 31, 2016 at 11:51

4 Answers 4

2

We could solve it with a combination of binary dilation and erosion to get past the first stage and then binary closing to get the final output, like so -

from scipy.ndimage.morphology import binary_erosion,binary_dilation,binary_closing

K = np.ones(3,dtype=int) # Kernel
b = binary_dilation(binary_erosion(a,K),K)
out = binary_closing(b,K) | b

Sample runs

Case #1 :

In [454]: a
Out[454]: array([1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0])

In [456]: out
Out[456]: 
array([ True,  True,  True,  True,  True,  True,  True, False, False,
       False, False, False,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True, False], dtype=bool)

Case #2 :

In [460]: a
Out[460]: 
array([0, 1, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 1, 1, 0])

In [461]: out
Out[461]: 
array([False,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True, False, False, False, False,
       False,  True,  True,  True,  True,  True, False], dtype=bool)
Sign up to request clarification or add additional context in comments.

Comments

1

Instead of solving the conventional way of looping and maintaining count we can convert all 0's and 1's to a single string and replace a regex match with another char say 2. Once that is done we split out the string again and check for bool() over each char.

>>> import re
>>> lst=[1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0]
>>> list(map(bool, map(int, list(re.sub(r'1{3,}0{1,2}1{3,}', lambda x:'2'*len(x.group()), ''.join(map(str, lst)))))))
[True, True, True, True, True, True, True, False, True, True, False, False, True, True, True, True, True, True, True, True, True, True, False]
>>> 

All the operations happen over here:

re.sub(r'1{3,}0{1,2}1{3,}', lambda x:'2'*len(x.group()), ''.join(map(str, lst)))

where it searches for a contiguous occurrence of 3 or more 1's followed by at most 2 0's i.e 1 or 2 0's followed by 3 or more 1's and replaces the whole matched string with same length string of 2's (used 2 because bool(2) is True). Also you can use tolist() method in NumPy to get a list out of NumPy array like this: np.array([1,2, 3, 4, 5, 6]).tolist()

Edit 1 : after a change in question, here is the updated answer:

>>> lst=[1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0]
>>> import re
>>> list(map(lambda x:False if x == 0 or x ==1 else True, map(int, list(re.sub(r'1{3,}0{1,2}1{3,}', lambda x:'2'*len(x.group()), ''.join(map(str, lst)))))))
[True, True, True, True, True, True, True, False, False, False, False, False, True, True, True, True, True, True, True, True, True, True, False]
>>> 

Edit 2 FINAL ANSWER:

>>> import re
>>> lst=[0, 1, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 1, 1, 0]
>>> while re.subn(r'[12]{3,}0{1,2}[12]{3,}', lambda x:'2'*len(x.group()), ''.join(map(str, lst)))[1]:
...     lst=re.subn(r'[12]{3,}0{1,2}[12]{3,}', lambda x:'2'*len(x.group()), ''.join(map(str, lst)))[0]
... 
>>> lst
'0222222222222200100111110'
>>> lst=list(re.sub(r'1{3,}', lambda x:'2'*len(x.group()), ''.join(map(str, lst))))
>>> lst
['0', '2', '2', '2', '2', '2', '2', '2', '2', '2', '2', '2', '2', '2', '0', '0', '1', '0', '0', '2', '2', '2', '2', '2', '0']
>>> list(map(lambda x:False if x == 0 or x ==1 else True, map(int, lst)))
[False, True, True, True, True, True, True, True, True, True, True, True, True, True, False, False, False, False, False, True, True, True, True, True, False]
>>> 

6 Comments

This doesn't return the desired result, after correction in original question. Should be easy to fix though.
Well, no it doesn't work like that. Everytime you have something like 0,1,1,0,1,1,1,0,1,1,0 the three 1 will not be caught.
@hvwaldow sorry .. was offline, just saw the updated question and updated my answer accordingly
@hvwaldow so you mean 3 1's should represent true? if yes we can add another OR expression in regex as 1{3,} but I don't think that OP has mentioned anything about that
May I ask what the .group() for x does? x here is a string I guess?
|
1

Many ways to to it. I would split this into grouping, apply conditions to groups and flattening operations. Like so:

from itertools import groupby, starmap
import numpy as np

a = np.array([1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0])

def condition(groups, key, newkey, minlen):
    return [(newkey, l) if l < minlen and k == key else (k, l) for k, l in groups]

def flatten(groups):
    return [k for g in starmap(lambda k, l: l * [k], groups) for k in g]

def group(l):
    return [(k, len(list(v))) for k, v in groupby(l)]

res = group(flatten(condition(group(a), 1, 0, 3)))
# groups zeros at the beginning or the end never change to ones,
# no matter their length
res = flatten([res[0]] + condition(res[1:-1], 0, 1, 3) + [res[-1]])
print [bool(v) for v in res]

Comments

1

i know this is not so "python-wise", but since you talks about algorithm, i decided to give it a try (sorry i am not so familiar with python)

import numpy as np
a = np.array([1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0])
b = np.array([int])

#init 2nd array 
for x in range (0,(a.size-1)):
    b = np.append(b,0)

print (b)
#1st case
for x in range (2,(a.size)):
    if (a[x-2]==1 & a[x-1]==1 & a[x]==1): #1-1-1
        b[x] = 1
        b[x-1] = 1
        b[x-2] = 1

print (b)
#2nd case
for x in range (2,(b.size)):
    if (b[x-2]==1 & b[x]==1): #1-0-1
        if (b[x-1]==0): #sorry, i forget about logical op. in python
            b[x-1] = 1

print (b)
#3rd case
for x in range (3,(b.size)):
    if (b[x-3]==1 & b[x]==1): #1-0-0-1
        if (b[x-2]==0 & b[x]-1==0):
            b[x-1] = 1
            b[x-2] = 1

#4th case
for x in range (4,(b.size)):
    if (a[x-4]==1 & a[x-3]==1 & b[x]): #1-1-0-0-1
        if (a[x-2]==0 & a[x]-1==0):
            b[x-3] = 1
            b[x-4] = 1
print (b)

I am not sure if this is exactly your expected result, but here it is:
[1 1 1 1 1 1 1 0 1 1 0 0 1 1 1 1 1 1 1 1 1 1 0]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.