2

Here a simple example

import numpy as np
x=np.random.rand(5,5)
k,p = np.where(x>0.5)

k and p are arrays of indices

Now I have a list of rows which should be considered m=[0,2,4], so I need to find all entries of k which are in the list m.

I came up with a very simple but horrible inefficient solution

d = np.array([ (a,b) for a,b in zip(k,p) if a in m])

The solution works, but very slow. I’m looking for a better and more efficient one. I need to do a few millions of such operations with dynamically adjusted m, so efficiency of an algorithm is really a critical question.

2
  • @U9-Forward as it’s stated in the text, m is just a list. Something like m=[0,2,4]. Example is really simple. In reality it is x is 5000x5000 and len(m) is about a few thousand Commented Dec 26, 2018 at 6:18
  • Sorry, i miss that part, :-) Commented Dec 26, 2018 at 6:19

3 Answers 3

3

Maybe the below is faster:

d=np.dstack((k,p))[0]
print(d[np.isin(d[:,0],m)])
Sign up to request clarification or add additional context in comments.

7 Comments

Could you please elaborate a bit more. Your solution seems work also well, but it isn’t clear why is it better than Sharu’s solution
@rth I think it is cleaner, because you just create an array with the zip between k and p, then simply get the rows where the first element is in m list.
Wait, but in this case it should be d[:,0]
Well I have to admit your solution is the best
I found that construction array( list( zip( ) ) ) is really slow, even for precomputations. It took almost 5 minutes at the script start, so I wasn’t happy. It seems np.dstack()[0] do the same job 20 times faster. Could please update the answer, so the fastest method will be documented? Thank you!
|
2

You could use isin() to get a boolean mask which you can use to index k.

>>> x=np.random.rand(3,3)
>>> x
array([[0.74043564, 0.48328081, 0.82396324],
       [0.40693944, 0.24951958, 0.18043229],
       [0.46623863, 0.53559775, 0.98956277]])
>>> k, p = np.where(x > 0.5)
>>> p
array([0, 2, 1, 2])
>>> k
array([0, 0, 2, 2])
>>> m
array([0, 1])  
>>> np.isin(k, m)
array([ True,  True, False, False])
>>> k[np.isin(k, m)]
array([0, 0])

2 Comments

I see. 'isin' was added in numpy 1.13. The equivalent in 1.12 would be in1d I assume. You could try that if you're limited to versioning on your system. edit: I will leave this comment here in case someone else happens to be curious as to why isin wouldn't work.
Great! Glad I could be of help.
0

How about:

import numpy as np
m = np.array([0, 2, 4])
k, p = np.where(x[m, :] > 0.5)
k = m[k]
print(zip(k, p))

This only considers the interesting rows (and then zips them to 2d indices).

3 Comments

elegant solution, but unfortunately doesn’t work in this way. The k and p are precalulated ones before filtering by different m. Run comparison in reduced but still huge array will be even slower.
You mean you get k and p using a different method, and just want to filter k using m regardless of x?
X is a big but static matrix. There is no reason to scan it more than once to find k and p. I checked your algorithm, it seems it is much slower than my.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.