index by comparision of two numpy arrays in python

Question

I have two numpy arrays I want to make a index by index comparision,

For example

a=[1,'aaa', 'bbb', 'vvv', 'www']
b=[2,'qqq', 'bbb', 'ppp', 'www']

Normally an itersection would compare each value of an array to each value of a different array,

Is there any efficient way is python to compare two np arrays index wise From the above example when we perform intersection between a and b, we see that value 2 of array b is compared to all the values in a , similarly value 'qqq' of array b is compared to all the values in the array a, which at the worst case can give a n*n complexity, n being the length of the array.

The output of above exmaple would result as 2 (True for 'bbb' and 'www')

What I want is that the intersection could be made index wise, lets say when array b is compared to a. value2 in array b should be compared to only value 1 of array a, and object 'qqq' of b should be compared to object 'aaa' of a and so on ..

This would also solve the n*n worst case complexity of above intersection result.

EdChum · Accepted Answer · 2015-05-22 13:34:23Z

1

If I understand what you're after you can just create arrays from the lists and compare directly, you can then get the count by calling sum:

In [161]:

a=[1,'aaa', 'bbb', 'vvv', 'www']
b=[2,'qqq', 'bbb', 'ppp', 'www']
A = np.array(a)
B = np.array(b)
sum(A==B)
Out[161]:
2

When using performing equality comparison this will produce a boolean array:

In [166]:

A==B
Out[166]:
array([False, False,  True, False,  True], dtype=bool)

when you call sum on this the True values are cast to 1 and the False are cast to 0 allowing you to sum the True values

EDIT

It will be more performant to just call .sum() on the np.array:

In [173]:

a=[1,'aaa', 'bbb', 'vvv', 'www']
a *=100
b=[2,'qqq', 'bbb', 'ppp', 'www']
b *=100
A = np.array(a)
B = np.array(b)
%timeit (A==B).sum()
%timeit sum(A==B)
The slowest run took 2784.03 times longer than the fastest. This could mean that an intermediate result is being cached 
100000 loops, best of 3: 11.4 µs per loop
1000 loops, best of 3: 1.34 ms per loop

the top-level sum is significantly slower which is to be expected.

edited May 22, 2015 at 13:34

answered May 22, 2015 at 13:01

EdChum

397k204 gold badges836 silver badges583 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Sam Over a year ago

Thanks a lot EdChum. But when I run it for a very large dataset, then coverting the whole np.array into sets and then performing a.itersection(b), give approximately 10 times better performance when compared to sum(a==b)

EdChum Over a year ago

@Sam yes that would be better in that case, could you try (A==B).sum()

Sam Over a year ago

Hi Ed, Even this takes a longer time. The set intersection for 8000 records and 22 columns takes 44 sec, whereas the (A==B).sum() takes 252 secs

EdChum Over a year ago

OK but is the timing including the conversion into sets and intersection?

rickhg12hs Over a year ago

@Sam : What exactly are you calling a record and a column? There's more than an a and a b? Or is there actually a large 2D array where you are comparing rows?

bagrat · Accepted Answer · 2015-05-22 13:05:43Z

0

Use zip Python built-in function:

zip(a, b)

This will generate a new list out of the elements of a and b, paired member wise.

For example:

>>> zip([1, 2, 3], [4, 5, 6])
[(1, 4), (2, 5), (3, 6)]

Then you can run your comparison on the elements of the new array:

for elem in zip(a, b):
    your_comparison(elem[0], elem[1])
    ...

edited May 22, 2015 at 13:05

answered May 22, 2015 at 12:59

bagrat

7,5166 gold badges33 silver badges49 bronze badges

Comments

rickhg12hs · Accepted Answer · 2015-05-22 13:44:11Z

0

If I understand what you want, here is a way:

[Using ipython]

In [1]: import numpy as np

In [2]: a = np.array([1, 'aaa', 'bbb', 'vvv', 'www'])

In [3]: b = np.array([2, 'qqq', 'bbb', 'ppp', 'www'])

In [4]: (a == b).sum()
Out[4]: 2

edited May 22, 2015 at 13:44

answered May 22, 2015 at 13:16

rickhg12hs

12k6 gold badges31 silver badges50 bronze badges

Collectives™ on Stack Overflow

index by comparision of two numpy arrays in python

3 Answers 3

5 Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

5 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related