3

I have 2 2d numpy arrays A and B I want to remove all the rows in A which appear in B.

I tried something like this:

A[~np.isin(A, B)]

but isin keeps the dimensions of A, I need one boolean value per row to filter it.

EDIT: something like this

A = np.array([[3, 0, 4],
              [3, 1, 1],
              [0, 5, 9]])
B = np.array([[1, 1, 1],
              [3, 1, 1]])

.....

A = np.array([[3, 0, 4],
              [0, 5, 9]])
2
  • 2
    I think a short example would better illustrate your question Commented Jan 20, 2022 at 16:27
  • How big are your arrays in practice? Are the items bounded to a small value? Are they always positives integers? Commented Jan 20, 2022 at 18:11

3 Answers 3

2

Probably not the most performant solution, but does exactly what you want. You can change the dtype of A and B to be a unit consisting of one row. You need to ensure that the arrays are contiguous first, e.g. with ascontiguousarray:

Av = np.ascontiguousarray(A).view(np.dtype([('', A.dtype, A.shape[1])])).ravel()
Bv = np.ascontiguousarray(B).view(Av.dtype).ravel()

Now you can apply np.isin directly:

>>> np.isin(Av, Bv)
array([False,  True, False])

According to the docs, invert=True is faster than negating the output of isin, so you can do

A[np.isin(Av, Bv, invert=True)]
Sign up to request clarification or add additional context in comments.

7 Comments

This is clever, but: A = np.array([[3, 0, 4], [3, 1, 1], [0, 5, 9]], order="F") (and before you go "oh that's just a weird corner condition", that's the order returned by .transpose())
@CJR. I am quite aware of where order='F' comes in. Clearly this only works if the last axis is contiguous. Being able to create a view with Fortran ordered arrays is actually deprecated and you should see a warning. I removed it for the next release: github.com/numpy/numpy/pull/20722
What about using a copy to ensure the ordering/contiguity is correct?
@JérômeRichard. Added
@CJR. Fixed the copy issue
|
1

Try the following - it uses matrix multiplication for dimensionality reduction:

import numpy as np

A = np.array([[3, 0, 4],
              [3, 1, 1],
              [0, 5, 9]])
B = np.array([[1, 1, 1],
              [3, 1, 1]])

arr_max = np.maximum(A.max(0) + 1, B.max(0) + 1)
print (A[~np.isin(A.dot(arr_max), B.dot(arr_max))])

Output:

[[3 0 4]
 [0 5 9]]

6 Comments

Oh wow, yes this does what I wanted, removing rows from A which are present in B. But I have no idea how it works :D
As it’s currently written, your answer is unclear. Please edit to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers in the help center.
This is clever, but: A = np.array([[1, 0, 0], [1, 1, 0], [0, 1, 1]]) and B = np.array([[1, 1, 0], [1, 0, 1]])
Clever. Not perfect, but a great idea
The answer probably does not always works since the dot product does not generate unique values. I think a cumulated product is missing (like when we flatten an ND array). Additionally, if the maximum value is big or the size of the array is big, then there will certainly be some overflows causing bugs.
|
0

This is certainly not the most performant solution but it is relatively easy to read:

A = np.array([row for row in A if row not in B])

Edit:

I found that the code does not correctly work, but this does:

A = [row for row in A if not any(np.equal(B, row).all(1))]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.