More optimal way to find then sort from two numpy arrays

Question

For simplicity, take the small example below. Lets say we have 2 sets of numpy arrays, values and distances. I'd like to find values that are above 1 and sort them by its corresponding distances. If there are values with similar distances, I'd like to have it sorted with the higher value first.

v = np.array([[1.0,2.0,0.0],[1.0,0.0,0.0],[0.0,0.0,0.0]])
d = np.array([[1.5,1.0,1.5],[1.0,0.0,1.0],[1.5,1.0,1.5]])

indexes = np.argwhere(v >= 1)

list = ( ((d[r,c],v[r,c],(r,c))) for r, c in indexes)

closest_highest = sorted(list,key=lambda t: (t[0],-t[1]))
print(closest_highest)

output:

[(1.0, 2.0, (0, 1)), (1.0, 1.0, (1, 0)), (1.5, 1.0, (0, 0))]

Each tuple contains the distance, value, and its coordinates from the two arrays.

Is there a faster way to do the above using just numpy/vectorized computations? If not, is there a faster/more efficient way to do the following? I dont really need it to return a tuple, just the index is enough. Even just the index of the lowest distance with the highest value is enough.

If there are values with similar distances, I'd like to have it sorted with the higher value first - Which higher values? Apart from the distances, there are only row and column indices. — Divakar
– Divakar, Commented Jan 3, 2018 at 19:31
Value from the 'v' array. As the example, indexes (0,1) and (1,0) both have a distance of 1.0. But (0,1) is first since its value '2' is higher than value '1' from index (1,0). Hope that helps — user1179317
– user1179317, Commented Jan 3, 2018 at 19:33

Divakar · Accepted Answer · 2018-01-03 20:30:08Z

1

Approach #1 : Here's one approach to get index of the lowest distance with the highest value -

# Get row, col indices for the condition
r,c = np.where(v >= 1)

# Extract corresponding values off d and v
di = d[r,c]

# Get indices (indexable into r,c) corresponding to lowest distance
ld_indx = np.flatnonzero(di == di.min())

# Get max index (based off v) out of the selected indices
max_v_idx = v[r[ld_indx], c[ld_indx]].argmax()

# Get the index (indexable into r,c) with the max one based off v
max_idx = ld_indx[max_v_idx]

# Index into r,c with it
lowest_index_out = (r[max_idx], c[max_idx])

Think of it as a two-step filtering process - Once based off min di values and then in the next one the argmax() out of the first-step filtered ones to select that one winner. ld_indx and max_v_idx being the two filtering steps. max_idx is the step that traces back and gets us the index that could be used to get the final indexing tuple off r,c.

Approach #2 : Using more of masking -

indexes = np.argwhere(v >= 1)

di = d[indexes[:,0],indexes[:,1]]
valid_mask = di == di.min()
indexes_mask = indexes[valid_mask]

maxv_indx = v[indexes_mask[:,0],indexes_mask[:,1]].argmax()
lowest_index_out = indexes[valid_mask][maxv_indx]

edited Jan 3, 2018 at 20:30

answered Jan 3, 2018 at 19:45

Divakar

222k19 gold badges273 silver badges374 bronze badges

Sign up to request clarification or add additional context in comments.

8 Comments

user1179317 Over a year ago

Question for the first approach. I understand all except the last one. I know what its doing, but i cant seem to understand how it works. For v[r,c][lidx].argmax(). You are passing r,c on the row index of v? How does that work?

user1179317 Over a year ago

Actually I am still a bit confused on that last statement :P

Divakar Over a year ago

@user1179317 See if the added comments help out.

user1179317 Over a year ago

Not sure why I am having trouble grasping this. But for me, 'r' and 'c' are 1 dimensional arrays. 'di' is a 1D array as well, containing the distances where v>=1. 'ld_indx' is also 1D array containing the indices of the smallest distances. 'max_idx' should just be a single number where the max value is, based on indices from ld_indx. Which I still understand. But r[ld_indx][max_idx] is a bit confusing since 'r' is a 1D array. I know its working, but I am a little confused how 'r' can have 2 indices, ld_indx and max_idx

user1179317 Over a year ago

Nevermind....r[ld_indx] returns multiple values..duh. Sorry. And thanks for the solution

|

Collectives™ on Stack Overflow

More optimal way to find then sort from two numpy arrays

1 Answer 1

8 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

8 Comments

Your Answer

Sign up or log in

Post as a guest

Related