3

I have two numpy array, I want to remove duplicate values from the first array (including the original value) and remove the items in the matching positions in the second array.

For example:

a = [1, 2, 2, 3]
b = ['a', 'd', 'f', 'c']

Becomes:

a = [1, 3]
b = ['a', 'c']

I need to do this efficiently and not use the naive solution which is time consuming

0

2 Answers 2

5

Here's one with np.unique -

unq,idx,c = np.unique(a, return_index=True, return_counts=True)
unq_idx = np.sort(idx[c==1])
a_out = a[unq_idx]
b_out = b[unq_idx]

Sample run -

In [34]: a
Out[34]: array([1, 2, 2, 3])

In [35]: b
Out[35]: array(['a', 'd', 'f', 'c'], dtype='|S1')

In [36]: unq,idx,c = np.unique(a, return_index=1, return_counts=1)
    ...: unq_idx = idx[c==1]
    ...: a_out = a[unq_idx]
    ...: b_out = b[unq_idx]

In [37]: a_out
Out[37]: array([1, 3])

In [38]: b_out
Out[38]: array(['a', 'c'], dtype='|S1')
Sign up to request clarification or add additional context in comments.

3 Comments

The solution's output isn't guaranteed to have the original ordering preserved, is there a way other than sorting unq_idx to keep the original ordering?
@Akram Fixed to keep the order. Check it out.
I always found the fact you need to use np.sort frustrating here. Is there a way to keep O(n) complexity?
2

Since you are open to NumPy, you may wish to consider Pandas, which uses NumPy internally:

import pandas as pd

a = pd.Series([1, 2, 2, 3])
b = pd.Series(['a', 'd', 'f', 'c'])

flags = ~a.duplicated(keep=False)
idx = flags[flags].index

a = a[idx].values
b = b[idx].values

Result:

print(a, b, sep='\n')

array([1, 3], dtype=int64)
array(['a', 'c'], dtype=object)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.