4

Given two arrays of different size aa and bb, I need to replace the elements in aa with those elements in bb that are closest.

This is what I have right now. It works [*], but I'm wondering if there's a better way.

import numpy as np

# Some random data
aa = np.random.uniform(0., 1., 100)
bb = np.array([.1, .2, .4, .55, .97])

# For each element in aa, find the index of the nearest element in bb
idx = np.searchsorted(bb, aa)
# For indexes to the right of the rightmost bb element, associate to the last
# bb element.
msk = idx > len(bb) - 1
idx[msk] = len(bb) - 1

# Replace values in aa
aa = np.array([bb[_] for _ in idx])


[*]: actually it almost works. As pointed out in the comments, np.searchsorted doesn't return the index of the closest element, but "indices into a sorted array a such that, if the corresponding elements in v were inserted before the indices, the order of a would be preserved", which is not the same.

2
  • 1
    searchsorted doesn't find the nearest element - you still need to check two candidates to see which is nearer. Commented Aug 22, 2018 at 18:20
  • That's a great point I hadn't really considered. Commented Aug 22, 2018 at 18:22

2 Answers 2

5

You have to calculate the difference between each element in aa and bb, and take the minimum:

aa_nearest = bb[abs(aa[None, :] - bb[:, None]).argmin(axis=0)]
Sign up to request clarification or add additional context in comments.

5 Comments

Great answer, simple and fast. Thank you Daniel!
Note that it'll be very fast for small arrays and very slow for larger ones.
@DSM: what would be faster?
In my tests even for a million elements in aa it takes less than 1 sec in my modest old laptop. The answer by Abhinav takes more than 11 sec.
The performance problems will come about when both aa and bb are larger, because we're creating an intermediate array of size aa * bb. As long as you know that the maximum size of bb is small you're going to be fine.
0

No doubt the answer by Daniel is impressive but it might be slow for large arrays as the number of calculations and comparisons would be high

Another way to do it would be

import numpy as np
aa = np.random.uniform(0., 1., 100)
bb = np.array([.1, .2, .4, .55, .97])
idx = np.searchsorted(bb, aa)
msk = idx > len(bb) - 1
idx[msk] = len(bb) - 1

idx_new=np.array([idx[i]-1 if abs(bb[idx[i]-1]-aa[i])<abs(bb[idx[i]]-aa[i]) else idx[i] for i in range(len(idx))])
aa = np.array([bb[_] for _ in idx_new])

Here difference for only adjacent values are calculated after using searchsorted().

4 Comments

Using indices and for-loops is much slower than matrix operations.
Please see my cmmt in the other answer, this is much slower than Daniel's method.
When mentioning performance, you should at least time your code.
ok thanks i got it. I just thought algorithm wise that it has less number of operations, but the conditional statements are taking time, may be i should use some other way to apply same algo.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.