Given two Numpy arrays, A and B, how would you find the index where B occurs in A, allowing for some amount of noise?
For example:
>>> A = [1.2, 4.5, 18.1, 19.1, 3.3, 7.4, 9.5, 1.0, 6.5, 4.9, 2.4]
>>> B = [19.15, 3.35, 7.3]
>>> find_position(A, B)
3
The naive implementation of find_position(a, b) would be to just just loop over every index in A, and then from there iterate over B, calculating it's Euclidean distance for each pair of numbers from A and B, tracking the index that has the smallest distance.
Something like:
def find_position(a, b):
"""
Finds index of b in a.
"""
best = (1e999999, None)
assert a.size >= b.size
for i in range(a.size - b.size):
score = sum(abs(b[j] - a[i+j]) for j in range(b.size))
best = min(best, (score, i))
return best[1]
I'm guessing this is far from the most efficient solution. I'm not sure what the exact Big-O notation would be, but it's probably close to O(M*N), so for large arrays, this would take forever.
Is there a more efficient approach, or some method built-in to Numpy that makes those nested for loops a bit faster?
O(log(M) * N)in case you usenp.searchsorted