Sorting and indexing between 2 numpy arrays

Question

I am having the following numpy arrays:

import numpy as np
y2 = np.array([[0.2,0.1,0.8,0.4],[0.4,0.2,0.5,0.1],[0.4,0.2,0.5,0.1]])
y1 = np.array([[1,0,0,0],[0,1,0,0],[0,0,0,1]])

What I am trying to do is to get the position of y1 compared to y2. To be more clear: y1 is the label data and y2 is the predicted data and I want to see in which rank position an algorithm predicted compared with the real data.

I am doing the following:

counter = 0
indexes2 = []
indexes = np.where(y1)[1]
sorted_values = np.argsort(-y2)
for value in sorted_values:
    indexes2.append(np.where(value==indexes[counter])[0][0] + 1)
    counter += 1
b = np.array(indexes2)

The output is correct:

>>> b
>>> array([2, 2, 3], dtype=int64)

But, I am pretty sure that there is a more elegant way of doing and more optimized. Any hint?

You could possibly try writing the for loop as a python list omprehension — tooty44
– tooty44, Commented Jan 20, 2017 at 15:54

Divakar · Accepted Answer · 2017-01-20 16:12:11Z

1

Vectorize the nested loop

We could get rid of the loop by making use of broadcasting -

b = (sorted_values == indexes[:,None]).argmax(1)+1

Some Improvement

For performance, we could optimize the computation of indexes, like so -

indexes = y1.argmax(1)

Bigger Improvement

Additionally, we could optimize on sorted_values computation by avoiding the negation of y2, by doing -

sorted_values2 = np.argsort(y2)

Then, compute b by using broadcasted comparsion as done earlier and subtract the argmax indices from the length of each row. This in effect does the descending ordering along each row as done in the posted question, where we had negation of argsort.

Thus, the final step would be -

b = y2.shape[1] - (sorted_values2 == indexes[:,None]).argmax(1)

edited Jan 20, 2017 at 16:12

answered Jan 20, 2017 at 15:53

Divakar

222k19 gold badges273 silver badges374 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Mpizos Dimitris Over a year ago

thanks for the answer and my apologies for the late reply. I was afk this weekend. As you mentioned the third solution gives the best time. y1 is always having only one non zero value, however where and numpy.nonzero` functions are searching the whole array to find desired indexes.I am curious if there is a way to avoid where function or numpy.nonzero to find this unique index where 1 is found. I created a function with a loop that breaks everytime it finds the 1 but is way slower. Any insights?

Divakar Over a year ago

@MpizosDimitris As suggested in the post, did you try indexes = y1.argmax(1)?

Mpizos Dimitris Over a year ago

Your suggestions decreased around 15% the time. However most of the time is spent on argsort function, which I guess it cant be optimized..

Collectives™ on Stack Overflow

Sorting and indexing between 2 numpy arrays

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related