2

For obvious reasons I have two numpy arrays of different size one with an index column along with x y z coordinates and the other just containing the coordinates. (please ignore the first serial no., I can't figure out the formatting.) The second array has less no. of coordinates and I need the indexes (atomID) of those coordinates from the first array.

Array1 (with index column):

    serialNo. moleculeID atomID x y z
  1. 1 1 2 0 7.7590151 7.2925348 12.5933323
  2. 2 1 2 0 7.123642 6.1970949 11.5622416
  3. 3 1 6 0 6.944543 7.0390449 12.0713224
  4. 4 1 2 0 8.8900348 11.5477333 13.5633965
  5. 5 1 2 0 7.857268 12.8062735 13.4357052
  6. 6 1 6 0 8.2124357 12.1004238 14.0486889

Array2 (just the coordinates):

x          y             z
  1. 7.7590151 7.2925348 12.5933323
  2. 7.123642 6.1970949 11.5622416
  3. 6.944543 7.0390449 12.0713224
  4. 8.8900348 11.5477333 13.5633965

The array with the index column (atomID) has the indexes as 2, 2, 6, 2, 2 and 6. How can I get the indexes for the coordinates that are common in Array1 and Array2. I expect to return 2 2 6 2 as a list and then concatenate it with the second array. Any easy ideas?

Update:

Tried using the following code, but it doesn't seem to be working.

import numpy as np

a = np.array([[4, 2.2, 5], [2, -6.3, 0], [3, 3.6, 8], [5, -9.8, 50]])

b = np.array([[2.2, 5], [-6.3, 0], [3.6, 8]])

print a
print b

for i in range(len(b)):
 for j in range(len(a)):
    if a[j,1]==b[i,0]:
        x = np.insert(b, 0, a[i,0], axis=1) #(input array, position to insert, value to insert, axis)
        #continue
    else:
        print 'not true'
print x 

which outputs the following:

not true
not true
not true
not true
not true
not true
not true
not true
not true
[[ 3.   2.2  5. ]
 [ 3.  -6.3  0. ]
 [ 3.   3.6  8. ]]

but expectation was:

    [[ 4.   2.2  5. ]
     [ 2.  -6.3  0. ]
     [ 3.   3.6  8. ]]
1
  • If you have numpy look into the hstack function Commented Aug 4, 2015 at 19:57

4 Answers 4

2

The numpy_indexed package (disclaimer: I am its author) contains functionality to solve such problems in an elegant and efficient/vectorized manner:

import numpy_indexed as npi
print(a[npi.contains(b, a[:, 1:])])

The currently accepted answer strikes me as being incorrect for points which differ in their latter coordinates. And performance should be much improved here as well; not only is this solution vectorized, but worst case performance is NlogN, as opposed to the quadratic time complexity of the currently accepted answer.

Sign up to request clarification or add additional context in comments.

Comments

2

Two concise vectorized ways to do it using cdist -

from scipy.spatial.distance import cdist

out = a[np.any(cdist(a[:,1:],b)==0,axis=1)]

Or if you don't mind getting a bit voodoo-ish, here's np.einsum to replace np.any -

out = a[np.einsum('ij->i',cdist(a[:,1:],b)==0)]

Sample run -

In [15]: from scipy.spatial.distance import cdist

In [16]: a
Out[16]: 
array([[  4. ,   2.2,   5. ],
       [  2. ,  -6.3,   0. ],
       [  3. ,   3.6,   8. ],
       [  5. ,  -9.8,  50. ]])

In [17]: b
Out[17]: 
array([[ 2.2,  5. ],
       [-6.3,  0. ],
       [ 3.6,  8. ]])

In [18]: a[np.any(cdist(a[:,1:],b)==0,axis=1)]
Out[18]: 
array([[ 4. ,  2.2,  5. ],
       [ 2. , -6.3,  0. ],
       [ 3. ,  3.6,  8. ]])

In [19]: a[np.einsum('ij->i',cdist(a[:,1:],b)==0)]
Out[19]: 
array([[ 4. ,  2.2,  5. ],
       [ 2. , -6.3,  0. ],
       [ 3. ,  3.6,  8. ]])

Comments

1

This is just a pseudo code for your question:

import numpy as np
for i in range(len(array2)):
    for element in array1:
        if array2[i]xyz == elementxyz: #compare the coordinates of the two elements
            np.insert(array2[i], 0, element_coord) #insert the atomid at the beginning of the coordinate array
            break

2 Comments

I was thinking something similar, but is there any other shorter way?
Shorter as a less amount of code or as efficient way to do it?
0

Using a list instead of array for the values of np.insert did the trick.

import numpy as np

a = np.array([[4, 2.2, 5], [2, -6.3, 0], [3, 3.6, 8], [5, -9.8, 50]])

b = np.array([[2.2, 5], [-6.3, 0], [3.6, 8]])

print a
print b
x = []

for i in range(len(b)):
 for j in range(len(a)):
    if a[j,1]==b[i,0]:
        x.append(a[j,0])
    else:
        x = x
print np.insert(b,0,x,axis=1)

which would output:

[[ 4.   2.2  5. ]
 [ 2.  -6.3  0. ]
 [ 3.   3.6  8. ]]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.