1

I am having the following numpy arrays:

import numpy as np
y2 = np.array([[0.2,0.1,0.8,0.4],[0.4,0.2,0.5,0.1],[0.4,0.2,0.5,0.1]])
y1 = np.array([[1,0,0,0],[0,1,0,0],[0,0,0,1]])

What I am trying to do is to get the position of y1 compared to y2. To be more clear: y1 is the label data and y2 is the predicted data and I want to see in which rank position an algorithm predicted compared with the real data.

I am doing the following:

counter = 0
indexes2 = []
indexes = np.where(y1)[1]
sorted_values = np.argsort(-y2)
for value in sorted_values:
    indexes2.append(np.where(value==indexes[counter])[0][0] + 1)
    counter += 1
b = np.array(indexes2)    

The output is correct:

>>> b
>>> array([2, 2, 3], dtype=int64)

But, I am pretty sure that there is a more elegant way of doing and more optimized. Any hint?

2
  • You could possibly try writing the for loop as a python list omprehension Commented Jan 20, 2017 at 15:54
  • Did the posted solution work for you? Commented Jan 22, 2017 at 18:11

1 Answer 1

1

Vectorize the nested loop

We could get rid of the loop by making use of broadcasting -

b = (sorted_values == indexes[:,None]).argmax(1)+1

Some Improvement

For performance, we could optimize the computation of indexes, like so -

indexes = y1.argmax(1)

Bigger Improvement

Additionally, we could optimize on sorted_values computation by avoiding the negation of y2, by doing -

sorted_values2 = np.argsort(y2)

Then, compute b by using broadcasted comparsion as done earlier and subtract the argmax indices from the length of each row. This in effect does the descending ordering along each row as done in the posted question, where we had negation of argsort.

Thus, the final step would be -

b = y2.shape[1] - (sorted_values2 == indexes[:,None]).argmax(1)
Sign up to request clarification or add additional context in comments.

3 Comments

thanks for the answer and my apologies for the late reply. I was afk this weekend. As you mentioned the third solution gives the best time. y1 is always having only one non zero value, however where and numpy.nonzero` functions are searching the whole array to find desired indexes.I am curious if there is a way to avoid where function or numpy.nonzero to find this unique index where 1 is found. I created a function with a loop that breaks everytime it finds the 1 but is way slower. Any insights?
@MpizosDimitris As suggested in the post, did you try indexes = y1.argmax(1)?
Your suggestions decreased around 15% the time. However most of the time is spent on argsort function, which I guess it cant be optimized..

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.