1

Given two Numpy arrays, A and B, how would you find the index where B occurs in A, allowing for some amount of noise?

For example:

>>> A = [1.2, 4.5, 18.1, 19.1, 3.3, 7.4, 9.5, 1.0, 6.5, 4.9, 2.4]
>>> B = [19.15, 3.35, 7.3]
>>> find_position(A, B)
3

The naive implementation of find_position(a, b) would be to just just loop over every index in A, and then from there iterate over B, calculating it's Euclidean distance for each pair of numbers from A and B, tracking the index that has the smallest distance.

Something like:

def find_position(a, b):
    """
    Finds index of b in a.
    """
    best = (1e999999, None)
    assert a.size >= b.size
    for i in range(a.size - b.size):
        score = sum(abs(b[j] - a[i+j]) for j in range(b.size))
        best = min(best, (score, i))
    return best[1]

I'm guessing this is far from the most efficient solution. I'm not sure what the exact Big-O notation would be, but it's probably close to O(M*N), so for large arrays, this would take forever.

Is there a more efficient approach, or some method built-in to Numpy that makes those nested for loops a bit faster?

2
  • My signal processing chops are a bit rusty (to the extent they ever existed, which is very much a subject for debate) but isn't "cross correlation" the search term to use here? Commented Feb 23, 2022 at 8:11
  • I believe it's O(log(M) * N) in case you use np.searchsorted Commented Feb 23, 2022 at 9:15

1 Answer 1

1

You could reduce complexity to O(log(M) * N) if you calculate distances between points of B and its closest companions in A from both left and right sides only:

def find_position(A, B):
    #find indices that sorts an array
    argidx = np.argsort(A)
    
    # find two set of indices with respect to sorted array
    next_idx = np.searchsorted(A, B, sorter=argidx)
    prev_idx = next_idx - 1
    next_idx[next_idx == len(A)], prev_idx[prev_idx == -1] = len(A) - 1, 0 #fixing extreme cases
    
    # find two set of indices with respect to initial array: a set of previous points and a set of next points
    previous_idx, next_idx = argidx[prev_idx], argidx[next_idx]
    
    # find distances between points of B and it closest companions of A in both of sets
    previous_dist, next_dist = np.abs(A[previous_idx]- B), np.abs(A[next_idx]- B)
    
    # find indices of minimal values of distances found
    argmin_previous, argmin_next = np.argmin(previous_dist), np.argmin(next_dist)
    
    # if minimum value of distances in the first set is smaller than the one in the second set, 
    # return its place in initial array; else return place of minimum value in the second set
    if previous_dist[argmin_previous] < next_dist[argmin_next]:
        out = previous_idx[argmin_previous]
    else:
        out = next_idx[argmin_next]
    return out
Sign up to request clarification or add additional context in comments.

1 Comment

I tested this against my sample code, and the performance is identical.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.