Filtering of array elements by another array in numpy

Question

Here a simple example

import numpy as np
x=np.random.rand(5,5)
k,p = np.where(x>0.5)

k and p are arrays of indices

Now I have a list of rows which should be considered m=[0,2,4], so I need to find all entries of k which are in the list m.

I came up with a very simple but horrible inefficient solution

d = np.array([ (a,b) for a,b in zip(k,p) if a in m])

The solution works, but very slow. I’m looking for a better and more efficient one. I need to do a few millions of such operations with dynamically adjusted m, so efficiency of an algorithm is really a critical question.

@U9-Forward as it’s stated in the text, m is just a list. Something like m=[0,2,4]. Example is really simple. In reality it is x is 5000x5000 and len(m) is about a few thousand — rth
– rth, Commented Dec 26, 2018 at 6:18

U13-Forward · Accepted Answer · 2018-12-26 23:15:25Z

3

Maybe the below is faster:

d=np.dstack((k,p))[0]
print(d[np.isin(d[:,0],m)])

edited Dec 26, 2018 at 23:15

answered Dec 26, 2018 at 6:59

U13-Forward

71.8k15 gold badges100 silver badges125 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

rth Over a year ago

Could you please elaborate a bit more. Your solution seems work also well, but it isn’t clear why is it better than Sharu’s solution

U13-Forward Over a year ago

@rth I think it is cleaner, because you just create an array with the zip between k and p, then simply get the rows where the first element is in m list.

rth Over a year ago

Wait, but in this case it should be d[:,0]

rth Over a year ago

Well I have to admit your solution is the best

rth Over a year ago

I found that construction array( list( zip( ) ) ) is really slow, even for precomputations. It took almost 5 minutes at the script start, so I wasn’t happy. It seems np.dstack()[0] do the same job 20 times faster. Could please update the answer, so the fastest method will be documented? Thank you!

|

Sharu · Accepted Answer · 2018-12-26 06:42:27Z

2

You could use isin() to get a boolean mask which you can use to index k.

>>> x=np.random.rand(3,3)
>>> x
array([[0.74043564, 0.48328081, 0.82396324],
       [0.40693944, 0.24951958, 0.18043229],
       [0.46623863, 0.53559775, 0.98956277]])
>>> k, p = np.where(x > 0.5)
>>> p
array([0, 2, 1, 2])
>>> k
array([0, 0, 2, 2])
>>> m
array([0, 1])  
>>> np.isin(k, m)
array([ True,  True, False, False])
>>> k[np.isin(k, m)]
array([0, 0])

edited Dec 26, 2018 at 6:42

answered Dec 26, 2018 at 6:29

Sharu

873 bronze badges

2 Comments

Sharu Over a year ago

I see. 'isin' was added in numpy 1.13. The equivalent in 1.12 would be in1d I assume. You could try that if you're limited to versioning on your system. edit: I will leave this comment here in case someone else happens to be curious as to why isin wouldn't work.

Sharu Over a year ago

Great! Glad I could be of help.

andersource · Accepted Answer · 2018-12-26 06:29:46Z

0

How about:

import numpy as np
m = np.array([0, 2, 4])
k, p = np.where(x[m, :] > 0.5)
k = m[k]
print(zip(k, p))

This only considers the interesting rows (and then zips them to 2d indices).

answered Dec 26, 2018 at 6:29

andersource

8294 silver badges10 bronze badges

3 Comments

rth Over a year ago

elegant solution, but unfortunately doesn’t work in this way. The k and p are precalulated ones before filtering by different m. Run comparison in reduced but still huge array will be even slower.

andersource Over a year ago

You mean you get k and p using a different method, and just want to filter k using m regardless of x?

rth Over a year ago

X is a big but static matrix. There is no reason to scan it more than once to find k and p. I checked your algorithm, it seems it is much slower than my.

Collectives™ on Stack Overflow

Filtering of array elements by another array in numpy

3 Answers 3

7 Comments

2 Comments

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

7 Comments

2 Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related