How to vectorize a simple for loop in Python/Numpy

Question

I found dozens of examples how to vectorize for loops in Python/NumPy. Unfortunately, I don't get how I can reduce the computation time of my simple for loop using a vectorized form. Is it even possible in this case?

time = np.zeros(185000)
lat1 = np.array(([48.78,47.45],[38.56,39.53],...)) # ~ 200000 rows
lat2 = np.array(([7.78,5.45],[7.56,5.53],...)) # same number of rows as time
for ii in np.arange(len(time)):
    pos = np.argwhere( (lat1[:,0]==lat2[ii,0]) and \
                       (lat1[:,1]==lat2[ii,1]) )
    if pos.size:
        pos = int(pos)
        time[ii] = dtime[pos]

What are lat, lon and time? In particular, what are their shapes? — Fred Foo
– Fred Foo, Commented Nov 14, 2012 at 13:39
Could you explain what's the meaning of pos = np.argwhere( (lat1[:,0]==lat2[ii,0]) and (lat1[:,1]==lat2[ii,1]) ) ? So, you want to find such a row in lat2 which is equal to lat1? Aren't you afraid of float-rounding errors? If so, you could use binary search on lat2 (search in its sorted copy) — alex_jordan
– alex_jordan, Commented Nov 14, 2012 at 14:46
I am looking for rows where lat1 and lat2 are equal in both columns. I need the rownumber of lat1 and lat2 where this is the case. At the moment "ii" and "pos" give me that and it works. I used np.around(XX,decimals=2) on both arrays to avoid rounding errors. — HyperCube
– HyperCube, Commented Nov 14, 2012 at 14:57
So, if lat1 = [[1,2], [3,4], [5,6], [7,8]] and lat2 = [[3,4], [5,6], [7,8], [1,2]] so the result of the algorithm should be [1, 2, 3, 0] (0-st element of lat2 is on 1-st position on lat1, 1 element of lat2 is on 2, 2 on 3, 3 on 0) Is this what you want? — alex_jordan
– alex_jordan, Commented Nov 14, 2012 at 15:33

Alex I · Accepted Answer · 2012-11-16 12:37:39Z

3

Probably the fastest way to find all matches is to sort both arrays and walk through them together, like this working example:

import numpy as np

def is_less(a, b):
    # this ugliness is needed because we want to compare lexicographically same as np.lexsort(), from the last column backward
    for i in range(len(a)-1, -1, -1):
        if a[i]<b[i]: return True
        elif a[i]>b[i]: return False
    return False

def is_equal(a, b):
    for i in range(len(a)):
        if a[i] != b[i]: return False
    return True

# lat1 = np.array(([48.78,47.45],[38.56,39.53]))
# lat2 = np.array(([7.78,5.45],[48.78,47.45],[7.56,5.53]))
lat1 = np.load('arr.npy')
lat2 = np.load('refarr.npy')

idx1 = np.lexsort( lat1.transpose() )
idx2 = np.lexsort( lat2.transpose() )
ii = 0
jj = 0
while ii < len(idx1) and jj < len(idx2):
    a = lat1[ idx1[ii] , : ]
    b = lat2[ idx2[jj] , : ]
    if is_equal( a, b ):
        # do stuff with match
        print "match found: lat1=%s lat2=%s %d and %d" % ( repr(a), repr(b), idx1[ii], idx2[jj] )
        ii += 1
        jj += 1
    elif is_less( a, b ):
        ii += 1
    else:
        jj += 1

This may not be perfectly pythonic (perhaps someone can think of a nicer implementation using generators or itertools?) but it is hard to imagine any method that relies on searching one point at a time beating this in speed.

edited Nov 16, 2012 at 12:37

answered Nov 16, 2012 at 8:05

Alex I

20.4k12 gold badges100 silver badges166 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

HyperCube Over a year ago

Unfortunately, this does not work for my data: dropbox.com/s/xs35kvoexi85bk0/arr.npy and dropbox.com/s/g4bdk509bvwzn3u/refarr.npy. arr should be lat1 and refarr lat2 in your code.

Alex I Over a year ago

@HyperCube, please try the latest code above. Works here, 0.4 sec on your sample data (including the np.load()s). I had forgotten the (obvious) index dereferences lat1[ idx1[ii], ... ] and so forth; it just happened to work with the data I used.

HyperCube Over a year ago

Works great! It takes 5.2 seconds on my machine for the complete data set compared to 10.3 seconds of the solution from alex_jordan

alex_jordan Over a year ago

Awesome! O(N) instead of my O(N*log(N)). How haven't I thought of it.

alex_jordan · Accepted Answer · 2012-11-15 15:22:58Z

2

Here is a solution. I'm not really sure that it's possible to vectorize it. If you want to make it resistant to "float comparing error" you should modify is_less and is_greater. The whole algo is just a binary search.

import numpy as np

#lexicographicaly compare two points - a and b

def is_less(a, b):
    i = 0
    while i<len(a):
        if a[i]<b[i]:
            return True
        else:
            if a[i]>b[i]:
                return False
        i+=1
    return False

def is_greater(a, b):
    i = 0
    while i<len(a):
        if a[i]>b[i]:
            return True
        else:
            if a[i]<b[i]:
                return False
        i+=1
    return False


def binary_search(a, x, lo=0, hi=None):
    if hi is None:
        hi = len(a)
    while lo < hi:
        mid = (lo+hi)//2
        midval = a[mid]
        if is_less(midval, x):
            lo = mid+1
        elif is_greater(midval, x):
            hi = mid
        else:
            return mid
    return -1

def lex_sort(v): #sort by 1 and 2 column respectively
    #return v[np.lexsort((v[:,2],v[:,1]))]
    order = range(1, v.shape[1])
    return v[np.lexsort(tuple(v[:,i] for i in order[::-1]))]

def sort_and_index(arr):
    ind = np.indices((len(arr),)).reshape((len(arr), 1))
    arr = np.hstack([ind, arr]) # add an index column as first column
    arr = lex_sort(arr)
    arr_cut = arr[:,1:] # an array to do binary search in
    arr_ind = arr[:,:1] # shuffled indices
    return arr_ind, arr_cut


#lat1 = np.array(([1,2,3], [3,4,5], [5,6,7], [7,8,9])) # ~ 200000 rows
lat1 = np.arange(1,800001,1).reshape((200000,4))
#lat2 = np.array(([3,4,5], [5,6,7], [7,8,9], [1,2,3])) # same number of rows as time
lat2 = np.arange(101,800101,1).reshape((200000,4))

lat1_ind, lat1_cut = sort_and_index(lat1)

time_arr = np.zeros(200000)
import time
start = time.time()

for ii, elem in enumerate(lat2):
    pos = binary_search(lat1_cut, elem)
    if pos == -1:
        #Not found
        continue
    pos = lat1_ind[pos][0]
    #print "element in lat2 with index",ii,"has position",pos,"in lat1"
print time.time()-start

The commented print is the place where you have corresponding indices of lat1 and lat2. Works for 7 seconds on 200000 rows.

edited Nov 15, 2012 at 15:22

answered Nov 14, 2012 at 16:13

alex_jordan

8771 gold badge6 silver badges11 bronze badges

10 Comments

HyperCube Over a year ago

Indeed, this is incredibly fast! However, I get strange results..I am checking right now why. What do I need to change if I lat1 and lat2 have 4 columns I want to compare with? Is it possible to just use them directly?

HyperCube Over a year ago

Furthermore, I simply don't get what is happening in the binary search and why this is so much faster than the simple loop in my starting post.. Do you have some literature on it for me?

alex_jordan Over a year ago

Well, imagine that you have a sorted array [1, 2, 6, 17, 25, 29, 37] and you want to find the index of 25 element. One approach would be to iterate through the whole array and compare each element with the sought one. But our array is sorted so by looking at element at any index we can tell whether the sought element is to the right or to the left from this element. For example let us start from the centre of our array: current element is 17, index is 3 (counting from 0). We are seeking for 25, 25>17 so there is no sense in seeking our element in the left part (indices [0,3]).

alex_jordan Over a year ago

Thus we can instead of iterating through the whole array do the following smart procedure: 1) Select centre of input array 2) Compare current element with the sought one. 3) If equals then we've found otherwise there are two cases: sought is smaller and sought is bigger. If smaller, set input array = left half of input array and go to 1). If bigger, the same but the right half. This algorithm allows us to find the index in O(log2(N)) instead of O(N) time. Compare: N = 200000 log2(N) = 17.609640474436812. That roughly means that with binary search you will find element in maximum 17 operations.

alex_jordan Over a year ago

instead of 200000 operations with naive search algorithm. Here is a good explanation community.topcoder.com/…

|

Collectives™ on Stack Overflow

How to vectorize a simple for loop in Python/Numpy

2 Answers 2

4 Comments

10 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

10 Comments

Your Answer

Sign up or log in

Post as a guest

Related