2

I have two numpy arrays I want to make a index by index comparision,

For example

a=[1,'aaa', 'bbb', 'vvv', 'www']
b=[2,'qqq', 'bbb', 'ppp', 'www']

Normally an itersection would compare each value of an array to each value of a different array,

Is there any efficient way is python to compare two np arrays index wise From the above example when we perform intersection between a and b, we see that value 2 of array b is compared to all the values in a , similarly value 'qqq' of array b is compared to all the values in the array a, which at the worst case can give a n*n complexity, n being the length of the array.

The output of above exmaple would result as 2 (True for 'bbb' and 'www')

What I want is that the intersection could be made index wise, lets say when array b is compared to a. value2 in array b should be compared to only value 1 of array a, and object 'qqq' of b should be compared to object 'aaa' of a and so on ..

This would also solve the n*n worst case complexity of above intersection result.

0

3 Answers 3

1

If I understand what you're after you can just create arrays from the lists and compare directly, you can then get the count by calling sum:

In [161]:

a=[1,'aaa', 'bbb', 'vvv', 'www']
b=[2,'qqq', 'bbb', 'ppp', 'www']
A = np.array(a)
B = np.array(b)
sum(A==B)
Out[161]:
2

When using performing equality comparison this will produce a boolean array:

In [166]:

A==B
Out[166]:
array([False, False,  True, False,  True], dtype=bool)

when you call sum on this the True values are cast to 1 and the False are cast to 0 allowing you to sum the True values

EDIT

It will be more performant to just call .sum() on the np.array:

In [173]:

a=[1,'aaa', 'bbb', 'vvv', 'www']
a *=100
b=[2,'qqq', 'bbb', 'ppp', 'www']
b *=100
A = np.array(a)
B = np.array(b)
%timeit (A==B).sum()
%timeit sum(A==B)
The slowest run took 2784.03 times longer than the fastest. This could mean that an intermediate result is being cached 
100000 loops, best of 3: 11.4 µs per loop
1000 loops, best of 3: 1.34 ms per loop

the top-level sum is significantly slower which is to be expected.

Sign up to request clarification or add additional context in comments.

5 Comments

Thanks a lot EdChum. But when I run it for a very large dataset, then coverting the whole np.array into sets and then performing a.itersection(b), give approximately 10 times better performance when compared to sum(a==b)
@Sam yes that would be better in that case, could you try (A==B).sum()
Hi Ed, Even this takes a longer time. The set intersection for 8000 records and 22 columns takes 44 sec, whereas the (A==B).sum() takes 252 secs
OK but is the timing including the conversion into sets and intersection?
@Sam : What exactly are you calling a record and a column? There's more than an a and a b? Or is there actually a large 2D array where you are comparing rows?
0

Use zip Python built-in function:

zip(a, b)

This will generate a new list out of the elements of a and b, paired member wise.

For example:

>>> zip([1, 2, 3], [4, 5, 6])
[(1, 4), (2, 5), (3, 6)]

Then you can run your comparison on the elements of the new array:

for elem in zip(a, b):
    your_comparison(elem[0], elem[1])
    ...

Comments

0

If I understand what you want, here is a way:

[Using ipython]

In [1]: import numpy as np

In [2]: a = np.array([1, 'aaa', 'bbb', 'vvv', 'www'])

In [3]: b = np.array([2, 'qqq', 'bbb', 'ppp', 'www'])

In [4]: (a == b).sum()
Out[4]: 2

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.