Filter rows in numpy array based on second array

Question

I have 2 2d numpy arrays A and B I want to remove all the rows in A which appear in B.

I tried something like this:

A[~np.isin(A, B)]

but isin keeps the dimensions of A, I need one boolean value per row to filter it.

EDIT: something like this

A = np.array([[3, 0, 4],
              [3, 1, 1],
              [0, 5, 9]])
B = np.array([[1, 1, 1],
              [3, 1, 1]])

.....

A = np.array([[3, 0, 4],
              [0, 5, 9]])

I think a short example would better illustrate your question — mozway
– mozway, Commented Jan 20, 2022 at 16:27
How big are your arrays in practice? Are the items bounded to a small value? Are they always positives integers? — Jérôme Richard
– Jérôme Richard, Commented Jan 20, 2022 at 18:11

Mad Physicist · Accepted Answer · 2022-01-20 18:48:25Z

2

Probably not the most performant solution, but does exactly what you want. You can change the dtype of A and B to be a unit consisting of one row. You need to ensure that the arrays are contiguous first, e.g. with ascontiguousarray:

Av = np.ascontiguousarray(A).view(np.dtype([('', A.dtype, A.shape[1])])).ravel()
Bv = np.ascontiguousarray(B).view(Av.dtype).ravel()

Now you can apply np.isin directly:

>>> np.isin(Av, Bv)
array([False,  True, False])

According to the docs, invert=True is faster than negating the output of isin, so you can do

A[np.isin(Av, Bv, invert=True)]

edited Jan 20, 2022 at 18:48

answered Jan 20, 2022 at 17:34

Mad Physicist

116k29 gold badges202 silver badges292 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

CJR Over a year ago

This is clever, but: A = np.array([[3, 0, 4], [3, 1, 1], [0, 5, 9]], order="F") (and before you go "oh that's just a weird corner condition", that's the order returned by .transpose())

Mad Physicist Over a year ago

@CJR. I am quite aware of where order='F' comes in. Clearly this only works if the last axis is contiguous. Being able to create a view with Fortran ordered arrays is actually deprecated and you should see a warning. I removed it for the next release: github.com/numpy/numpy/pull/20722

Jérôme Richard Over a year ago

What about using a copy to ensure the ordering/contiguity is correct?

Mad Physicist Over a year ago

@JérômeRichard. Added

Mad Physicist Over a year ago

@CJR. Fixed the copy issue

|

peru_45 · Accepted Answer · 2022-01-20 17:08:50Z

1

Try the following - it uses matrix multiplication for dimensionality reduction:

import numpy as np

A = np.array([[3, 0, 4],
              [3, 1, 1],
              [0, 5, 9]])
B = np.array([[1, 1, 1],
              [3, 1, 1]])

arr_max = np.maximum(A.max(0) + 1, B.max(0) + 1)
print (A[~np.isin(A.dot(arr_max), B.dot(arr_max))])

Output:

[[3 0 4]
 [0 5 9]]

edited Jan 20, 2022 at 17:08

answered Jan 20, 2022 at 16:42

peru_45

3103 silver badges16 bronze badges

6 Comments

user2505961 Over a year ago

Oh wow, yes this does what I wanted, removing rows from A which are present in B. But I have no idea how it works :D

Community Over a year ago

As it’s currently written, your answer is unclear. Please edit to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers in the help center.

CJR Over a year ago

This is clever, but: A = np.array([[1, 0, 0], [1, 1, 0], [0, 1, 1]]) and B = np.array([[1, 1, 0], [1, 0, 1]])

Mad Physicist Over a year ago

Clever. Not perfect, but a great idea

Jérôme Richard Over a year ago

The answer probably does not always works since the dot product does not generate unique values. I think a cumulated product is missing (like when we flatten an ND array). Additionally, if the maximum value is big or the size of the array is big, then there will certainly be some overflows causing bugs.

|

user2505961 · Accepted Answer · 2022-02-03 10:37:28Z

0

This is certainly not the most performant solution but it is relatively easy to read:

A = np.array([row for row in A if row not in B])

Edit:

I found that the code does not correctly work, but this does:

A = [row for row in A if not any(np.equal(B, row).all(1))]

edited Feb 3, 2022 at 10:37

answered Jan 21, 2022 at 7:00

user2505961

16012 bronze badges

Collectives™ on Stack Overflow

Filter rows in numpy array based on second array

3 Answers 3

7 Comments

6 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

7 Comments

6 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related