How to find location of numpy array in larger array?

Question

Given two Numpy arrays, A and B, how would you find the index where B occurs in A, allowing for some amount of noise?

For example:

>>> A = [1.2, 4.5, 18.1, 19.1, 3.3, 7.4, 9.5, 1.0, 6.5, 4.9, 2.4]
>>> B = [19.15, 3.35, 7.3]
>>> find_position(A, B)
3

The naive implementation of find_position(a, b) would be to just just loop over every index in A, and then from there iterate over B, calculating it's Euclidean distance for each pair of numbers from A and B, tracking the index that has the smallest distance.

Something like:

def find_position(a, b):
    """
    Finds index of b in a.
    """
    best = (1e999999, None)
    assert a.size >= b.size
    for i in range(a.size - b.size):
        score = sum(abs(b[j] - a[i+j]) for j in range(b.size))
        best = min(best, (score, i))
    return best[1]

I'm guessing this is far from the most efficient solution. I'm not sure what the exact Big-O notation would be, but it's probably close to O(M*N), so for large arrays, this would take forever.

Is there a more efficient approach, or some method built-in to Numpy that makes those nested for loops a bit faster?

My signal processing chops are a bit rusty (to the extent they ever existed, which is very much a subject for debate) but isn't "cross correlation" the search term to use here? — Ture Pålsson
– Ture Pålsson, Commented Feb 23, 2022 at 8:11
I believe it's O(log(M) * N) in case you use np.searchsorted — mathfux
– mathfux, Commented Feb 23, 2022 at 9:15

mathfux · Accepted Answer · 2022-02-23 21:26:50Z

1

You could reduce complexity to O(log(M) * N) if you calculate distances between points of B and its closest companions in A from both left and right sides only:

def find_position(A, B):
    #find indices that sorts an array
    argidx = np.argsort(A)
    
    # find two set of indices with respect to sorted array
    next_idx = np.searchsorted(A, B, sorter=argidx)
    prev_idx = next_idx - 1
    next_idx[next_idx == len(A)], prev_idx[prev_idx == -1] = len(A) - 1, 0 #fixing extreme cases
    
    # find two set of indices with respect to initial array: a set of previous points and a set of next points
    previous_idx, next_idx = argidx[prev_idx], argidx[next_idx]
    
    # find distances between points of B and it closest companions of A in both of sets
    previous_dist, next_dist = np.abs(A[previous_idx]- B), np.abs(A[next_idx]- B)
    
    # find indices of minimal values of distances found
    argmin_previous, argmin_next = np.argmin(previous_dist), np.argmin(next_dist)
    
    # if minimum value of distances in the first set is smaller than the one in the second set, 
    # return its place in initial array; else return place of minimum value in the second set
    if previous_dist[argmin_previous] < next_dist[argmin_next]:
        out = previous_idx[argmin_previous]
    else:
        out = next_idx[argmin_next]
    return out

edited Feb 23, 2022 at 21:26

answered Feb 23, 2022 at 14:28

mathfux

5,9792 gold badges20 silver badges38 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Cerin Over a year ago

I tested this against my sample code, and the performance is identical.

Collectives™ on Stack Overflow

How to find location of numpy array in larger array?

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related