Replace elements in numpy array with closest value in another array

Question

Given two arrays of different size aa and bb, I need to replace the elements in aa with those elements in bb that are closest.

This is what I have right now. It works [*], but I'm wondering if there's a better way.

import numpy as np

# Some random data
aa = np.random.uniform(0., 1., 100)
bb = np.array([.1, .2, .4, .55, .97])

# For each element in aa, find the index of the nearest element in bb
idx = np.searchsorted(bb, aa)
# For indexes to the right of the rightmost bb element, associate to the last
# bb element.
msk = idx > len(bb) - 1
idx[msk] = len(bb) - 1

# Replace values in aa
aa = np.array([bb[_] for _ in idx])

[*]: actually it almost works. As pointed out in the comments, np.searchsorted doesn't return the index of the closest element, but "indices into a sorted array a such that, if the corresponding elements in v were inserted before the indices, the order of a would be preserved", which is not the same.

searchsorted doesn't find the nearest element - you still need to check two candidates to see which is nearer. — user2357112
– user2357112, Commented Aug 22, 2018 at 18:20

Daniel · Accepted Answer · 2018-08-22 18:35:49Z

5

You have to calculate the difference between each element in aa and bb, and take the minimum:

aa_nearest = bb[abs(aa[None, :] - bb[:, None]).argmin(axis=0)]

answered Aug 22, 2018 at 18:35

Daniel

42.9k4 gold badges57 silver badges82 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Gabriel Over a year ago

Great answer, simple and fast. Thank you Daniel!

DSM Over a year ago

Note that it'll be very fast for small arrays and very slow for larger ones.

Daniel Over a year ago

@DSM: what would be faster?

Gabriel Over a year ago

In my tests even for a million elements in aa it takes less than 1 sec in my modest old laptop. The answer by Abhinav takes more than 11 sec.

DSM Over a year ago

The performance problems will come about when both aa and bb are larger, because we're creating an intermediate array of size aa * bb. As long as you know that the maximum size of bb is small you're going to be fine.

Abhinav Mishra · Accepted Answer · 2018-08-22 19:40:38Z

0

No doubt the answer by Daniel is impressive but it might be slow for large arrays as the number of calculations and comparisons would be high

Another way to do it would be

import numpy as np
aa = np.random.uniform(0., 1., 100)
bb = np.array([.1, .2, .4, .55, .97])
idx = np.searchsorted(bb, aa)
msk = idx > len(bb) - 1
idx[msk] = len(bb) - 1

idx_new=np.array([idx[i]-1 if abs(bb[idx[i]-1]-aa[i])<abs(bb[idx[i]]-aa[i]) else idx[i] for i in range(len(idx))])
aa = np.array([bb[_] for _ in idx_new])

Here difference for only adjacent values are calculated after using searchsorted().

answered Aug 22, 2018 at 19:40

Abhinav Mishra

204 bronze badges

4 Comments

Daniel Over a year ago

Using indices and for-loops is much slower than matrix operations.

Gabriel Over a year ago

Please see my cmmt in the other answer, this is much slower than Daniel's method.

user3483203 Over a year ago

When mentioning performance, you should at least time your code.

Abhinav Mishra Over a year ago

ok thanks i got it. I just thought algorithm wise that it has less number of operations, but the conditional statements are taking time, may be i should use some other way to apply same algo.

Collectives™ on Stack Overflow

Replace elements in numpy array with closest value in another array

2 Answers 2

5 Comments

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

5 Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related