Find nearest indices for one array against all values in another array - Python / NumPy

Question

I have a list of complex numbers for which I want to find the closest value in another list of complex numbers.

My current approach with numpy:

import numpy as np

refArray = np.random.random(16);
myArray = np.random.random(1000);


def find_nearest(array, value):
    idx = (np.abs(array-value)).argmin()
    return idx;

for value in np.nditer(myArray):
    index = find_nearest(refArray, value);
    print(index);

Unfortunately, this takes ages for a large amount of values. Is there a faster or more "pythonian" way of matching each value in myArray to the closest value in refArray?

FYI: I don't necessarily need numpy in my script.

Important: the order of both myArray as well as refArray is important and should not be changed. If sorting is to be applied, the original index should be retained in some way.

In terms of time complexity, a sliding window will probably be the most efficient. — willeM_ Van Onsem
– willeM_ Van Onsem, Commented Jul 27, 2017 at 11:33
I cannot see a sliding window being any more efficient than the current solution. To the best of my understanding, the current solution is O(n) which is the best to hope for. Then, there are some trade offs to do, in order to minimize the time constant. But that depends on wether your big case is exploding your memory or not. If it is not a memory issue, it might be possible to gain a little from using broadcasting and use more numpy calculations, but that might just as well slow you down if RAM memory is an issue. — JohanL
– JohanL, Commented Jul 27, 2017 at 11:39
@JohanL RAM is not an issue, time unfortunately is. This simple loop was both the simplest but also the best approach I could think of.. Unfortunately, with array sizes of ref=64 and values=200,000 the matching takes over 10 seconds, target would be well under one second... — Alexander
– Alexander, Commented Jul 27, 2017 at 11:52

Divakar · Accepted Answer · 2017-07-27 12:39:22Z

17

Here's one vectorized approach with np.searchsorted based on this post -

def closest_argmin(A, B):
    L = B.size
    sidx_B = B.argsort()
    sorted_B = B[sidx_B]
    sorted_idx = np.searchsorted(sorted_B, A)
    sorted_idx[sorted_idx==L] = L-1
    mask = (sorted_idx > 0) & \
    ((np.abs(A - sorted_B[sorted_idx-1]) < np.abs(A - sorted_B[sorted_idx])) )
    return sidx_B[sorted_idx-mask]

Brief explanation :

Get the sorted indices for the left positions. We do this with - np.searchsorted(arr1, arr2, side='left') or just np.searchsorted(arr1, arr2). Now, searchsorted expects sorted array as the first input, so we need some preparatory work there.
Compare the values at those left positions with the values at their immediate right positions (left + 1) and see which one is closest. We do this at the step that computes mask.
Based on whether the left ones or their immediate right ones are closest, choose the respective ones. This is done with the subtraction of indices with the mask values acting as the offsets being converted to ints.

Benchmarking

Original approach -

def org_app(myArray, refArray):
    out1 = np.empty(myArray.size, dtype=int)
    for i, value in enumerate(myArray):
        # find_nearest from posted question
        index = find_nearest(refArray, value)
        out1[i] = index
    return out1

Timings and verification -

In [188]: refArray = np.random.random(16)
     ...: myArray = np.random.random(1000)
     ...: 

In [189]: %timeit org_app(myArray, refArray)
100 loops, best of 3: 1.95 ms per loop

In [190]: %timeit closest_argmin(myArray, refArray)
10000 loops, best of 3: 36.6 µs per loop

In [191]: np.allclose(closest_argmin(myArray, refArray), org_app(myArray, refArray))
Out[191]: True

50x+ speedup for the posted sample and hopefully more for larger datasets!

edited Jul 27, 2017 at 12:39

answered Jul 27, 2017 at 12:05

Divakar

222k19 gold badges273 silver badges374 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Stefan Pochmann Over a year ago

Do you really need np.abs? I think you could also use (A - sorted_B[sorted_idx-1] < sorted_B[sorted_idx] - A) or (A < (sorted_B[sorted_idx-1] + sorted_B[sorted_idx]) / 2).

Stefan Pochmann Over a year ago

Also, I don't think using np.allclose to compare index arrays makes sense.

AGN Gazer Over a year ago

This is a very efficient solution. Compare to cKDTree in stackoverflow.com/a/52799252/8033585.

Simd Over a year ago

Who not sort B directly using np.sort?

Divakar Over a year ago

@Anush Because I needed sidx_B at the last step.

Make42 · Accepted Answer · 2019-12-10 16:58:43Z

10

An answer that is much shorter than that of @Divakar, also using broadcasting and even slightly faster:

abs(myArray[:, None] - refArray[None, :]).argmin(axis=-1)

answered Dec 10, 2019 at 16:58

Make42

13.3k28 gold badges92 silver badges172 bronze badges

Collectives™ on Stack Overflow

Find nearest indices for one array against all values in another array - Python / NumPy

2 Answers 2

5 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

5 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related