How to compare between two numpy arrays of different size and return the index column with common elements?

Question

For obvious reasons I have two numpy arrays of different size one with an index column along with x y z coordinates and the other just containing the coordinates. (please ignore the first serial no., I can't figure out the formatting.) The second array has less no. of coordinates and I need the indexes (atomID) of those coordinates from the first array.

Array1 (with index column):

    serialNo. moleculeID atomID x y z

1 1 2 0 7.7590151 7.2925348 12.5933323
2 1 2 0 7.123642 6.1970949 11.5622416
3 1 6 0 6.944543 7.0390449 12.0713224
4 1 2 0 8.8900348 11.5477333 13.5633965
5 1 2 0 7.857268 12.8062735 13.4357052
6 1 6 0 8.2124357 12.1004238 14.0486889

Array2 (just the coordinates):

x          y             z

7.7590151 7.2925348 12.5933323
7.123642 6.1970949 11.5622416
6.944543 7.0390449 12.0713224
8.8900348 11.5477333 13.5633965

The array with the index column (atomID) has the indexes as 2, 2, 6, 2, 2 and 6. How can I get the indexes for the coordinates that are common in Array1 and Array2. I expect to return 2 2 6 2 as a list and then concatenate it with the second array. Any easy ideas?

Update:

Tried using the following code, but it doesn't seem to be working.

import numpy as np

a = np.array([[4, 2.2, 5], [2, -6.3, 0], [3, 3.6, 8], [5, -9.8, 50]])

b = np.array([[2.2, 5], [-6.3, 0], [3.6, 8]])

print a
print b

for i in range(len(b)):
 for j in range(len(a)):
    if a[j,1]==b[i,0]:
        x = np.insert(b, 0, a[i,0], axis=1) #(input array, position to insert, value to insert, axis)
        #continue
    else:
        print 'not true'
print x

which outputs the following:

not true
not true
not true
not true
not true
not true
not true
not true
not true
[[ 3.   2.2  5. ]
 [ 3.  -6.3  0. ]
 [ 3.   3.6  8. ]]

but expectation was:

    [[ 4.   2.2  5. ]
     [ 2.  -6.3  0. ]
     [ 3.   3.6  8. ]]

If you have numpy look into the hstack function

The Brofessor
– The Brofessor

2015-08-04 19:57:07 +00:00
Commented Aug 4, 2015 at 19:57 — The Brofessor
– The Brofessor, Commented Aug 4, 2015 at 19:57

Eelco Hoogendoorn · Accepted Answer · 2016-04-29 13:33:17Z

2

The numpy_indexed package (disclaimer: I am its author) contains functionality to solve such problems in an elegant and efficient/vectorized manner:

import numpy_indexed as npi
print(a[npi.contains(b, a[:, 1:])])

The currently accepted answer strikes me as being incorrect for points which differ in their latter coordinates. And performance should be much improved here as well; not only is this solution vectorized, but worst case performance is NlogN, as opposed to the quadratic time complexity of the currently accepted answer.

answered Apr 29, 2016 at 13:33

Eelco Hoogendoorn

10.8k1 gold badge46 silver badges43 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Divakar · Accepted Answer · 2015-08-05 05:43:29Z

Two concise vectorized ways to do it using cdist -

from scipy.spatial.distance import cdist

out = a[np.any(cdist(a[:,1:],b)==0,axis=1)]

Or if you don't mind getting a bit voodoo-ish, here's np.einsum to replace np.any -

out = a[np.einsum('ij->i',cdist(a[:,1:],b)==0)]

Sample run -

In [15]: from scipy.spatial.distance import cdist

In [16]: a
Out[16]: 
array([[  4. ,   2.2,   5. ],
       [  2. ,  -6.3,   0. ],
       [  3. ,   3.6,   8. ],
       [  5. ,  -9.8,  50. ]])

In [17]: b
Out[17]: 
array([[ 2.2,  5. ],
       [-6.3,  0. ],
       [ 3.6,  8. ]])

In [18]: a[np.any(cdist(a[:,1:],b)==0,axis=1)]
Out[18]: 
array([[ 4. ,  2.2,  5. ],
       [ 2. , -6.3,  0. ],
       [ 3. ,  3.6,  8. ]])

In [19]: a[np.einsum('ij->i',cdist(a[:,1:],b)==0)]
Out[19]: 
array([[ 4. ,  2.2,  5. ],
       [ 2. , -6.3,  0. ],
       [ 3. ,  3.6,  8. ]])

Rafael Rios · Accepted Answer · 2015-08-04 20:09:06Z

1

This is just a pseudo code for your question:

import numpy as np
for i in range(len(array2)):
    for element in array1:
        if array2[i]xyz == elementxyz: #compare the coordinates of the two elements
            np.insert(array2[i], 0, element_coord) #insert the atomid at the beginning of the coordinate array
            break

answered Aug 4, 2015 at 20:09

Rafael Rios

5831 gold badge7 silver badges21 bronze badges

2 Comments

Rafat Over a year ago

I was thinking something similar, but is there any other shorter way?

Rafael Rios Over a year ago

Shorter as a less amount of code or as efficient way to do it?

Rafat · Accepted Answer · 2015-08-04 22:57:47Z

0

Using a list instead of array for the values of np.insert did the trick.

import numpy as np

a = np.array([[4, 2.2, 5], [2, -6.3, 0], [3, 3.6, 8], [5, -9.8, 50]])

b = np.array([[2.2, 5], [-6.3, 0], [3.6, 8]])

print a
print b
x = []

for i in range(len(b)):
 for j in range(len(a)):
    if a[j,1]==b[i,0]:
        x.append(a[j,0])
    else:
        x = x
print np.insert(b,0,x,axis=1)

which would output:

[[ 4.   2.2  5. ]
 [ 2.  -6.3  0. ]
 [ 3.   3.6  8. ]]

edited Aug 4, 2015 at 22:57

answered Aug 4, 2015 at 22:51

Rafat

1371 gold badge3 silver badges12 bronze badges

Collectives™ on Stack Overflow

How to compare between two numpy arrays of different size and return the index column with common elements?

4 Answers 4

Comments

Comments

2 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

Comments

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related