1

I have a large 2D np.array (vec). I would like to replace each value in vec with the closest value from a shorter array vals.

I have tried the following

replaced_vals=vals[np.argmin(np.abs(vec[:, np.newaxis] - vals), axis=0)]

but it does not work because the size of vec and vals are different.

Example input

vec = np.array([10.1,10.7,11.4,102,1100]
vals = np.array([10.0,11.0,100.0])

Desired output:

replaced_vals = [10.0,11.0,11.0,100.0,100.0]
1

3 Answers 3

2

If your vals array is sorted, a more memory efficient, and possibly generally more efficient, solution is possible via np.searchsorted:

def jpp(vec, vals):
    ss = np.searchsorted(vals, vec)
    a = vals[ss - 1]
    b = vals[np.minimum(len(vals) - 1, ss)]
    return np.where(np.fabs(vec - a) < np.fabs(vec - b), a, b)

vec = np.array([10.1,10.7,11.4,102,1100])
vals = np.array([10.0,11.0,100.0])

print(jpp(vec, vals))

[  10.   11.   11.  100.  100.]

Performance benchmarking

# Python 3.6.0, NumPy 1.11.3

n = 10**6
vec = np.array([10.1,10.7,11.4,102,1100]*n)
vals = np.array([10.0,11.0,100.0])

# @ThomasPinetz's solution, memory inefficient
def tho(vec, vals):
    return vals[np.argmin(np.abs(vec[:, np.newaxis] - vals), axis=1)]

def jpp(vec, vals):
    ss = np.searchsorted(vals, vec)
    a = vals[ss - 1]
    b = vals[np.minimum(len(vals) - 1, ss)]
    return np.where(np.fabs(vec - a) < np.fabs(vec - b), a, b)

# @Divakar's solution, adapted from first related Q&A link
def diva(A, B):
    L = B.size
    sorted_idx = np.searchsorted(B, A)
    sorted_idx[sorted_idx==L] = L-1
    mask = (sorted_idx > 0) & \
    ((np.abs(A - B[sorted_idx-1]) < np.abs(A - B[sorted_idx])) )
    return B[sorted_idx-mask]

assert np.array_equal(tho(vec, vals), jpp(vec, vals))
assert np.array_equal(tho(vec, vals), diva(vec, vals))

%timeit tho(vec, vals)   # 366 ms per loop
%timeit jpp(vec, vals)   # 295 ms per loop
%timeit diva(vec, vals)  # 334 ms per loop

Related Q&A

  1. Find nearest indices for one array against all values in another array - Python / NumPy
  2. Find nearest value in numpy array
Sign up to request clarification or add additional context in comments.

Comments

2

You have to look along the other axis to get the desired values like this:

replaced_vals=vals[np.argmin(np.abs(vec[:, np.newaxis] - vals), axis=1)]

Output for your problem:

array([  10.,   11.,   11.,  100.,  100.])

Comments

1

if vals is sorted, x_k from vec must be rounded to y_i from vals if :

                           (y_(i-1)+y_i)/2 <= x_k < (y_i+y_(i+1))/2.    

so, yet another solution using np.searchsorted, but minimizing operations and at least twice faster :

def bm(vec, vals):
    half = vals.copy() / 2
    half[:-1] += half[1:]
    half[-1] = np.inf
    ss = np.searchsorted(half,vec)
    return vals[ss]

%timeit bm(vec, vals)  # 84 ms per loop

If vals is also sorted you can finish the job with numba for another gap :

from numba import njit
@njit
def bmm(vec,vals):
    half=vals.copy()/2
    half[:-1] += half[1:]
    half[-1]=np.inf
    res=np.empty_like(vec)
    i=0
    for k in range(vec.size):
        while half[i]<vec[k]:
            i+=1
        res[k]=vals[i]
    return res

%timeit bmm(vec, vals)  # 31 ms per loop

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.