Vectorize nested for loop Python

Question

I have a numpy array that I'm iterating through with:

import numpy
import math
array = numpy.array([[1, 1, 2, 8, 2, 2],
               [5, 5, 4, 1, 3, 2],
               [5, 5, 4, 1, 3, 2],
               [5, 5, 4, 1, 3, 2],
               [9, 5, 8, 8, 2, 2],
               [7, 3, 6, 6, 2, 2]])


Pixels = ['U','D','R','L','UL','DL','UR','DR']

for i in range (1,array.shape[0]-1):
    for j in range (1,array.shape[1]-1):


         list = []
         while len(list) < 2:
                iToMakeList = i
                jToMakeList = j

                if iToMakeList > array.shape[0]-1 or iToMakeList < 1 or jToMakeList> array.shape[0]-1 or jToMakeList < 1:

                    break

                PixelCoord = {
            'U' : (iToMakeList-1,jToMakeList),
            'D' : (iToMakeList+1,jToMakeList),
            'R' : (iToMakeList,jToMakeList+1),
            'L' : (iToMakeList,jToMakeList-1),
            'UL' : (iToMakeList-1,jToMakeList-1),
            'DL' : (iToMakeList+1,jToMakeList-1),
            'UR' : (iToMakeList-1,jToMakeList+1),
            'DR' : (iToMakeList+1,jToMakeList+1)
                }
                Value = {
            'U' : array[iToMakeList-1][jToMakeList],
            'D' : array[iToMakeList+1][jToMakeList],
            'R' : array[iToMakeList][jToMakeList+1],
            'L' : array[iToMakeList][jToMakeList-1],
            'UL' : array[iToMakeList-1][jToMakeList-1],
            'DL' : array[iToMakeList+1][jToMakeList-1],
            'UR' : array[iToMakeList-1][jToMakeList+1],
            'DR' : array[iToMakeList+1][jToMakeList+1]
                }


                candidates = []
                for pixel in Pixels:
                    candidates.append((Value[pixel],pixel))

                Lightest = max(candidates)


                list.append(PixelCoord[Lightest[1]])

                iToMakeList = PixelCoord[Lightest[1]][0]
                jToMakeList = PixelCoord[Lightest[1]][1]

I want to accelerate this process. It's very slow.

Assume that the output of this snippet is my final goal and the ONLY thing I want to do is accelerate this code.

Your code doesn't make sense, since array,shape[0] will be a number, not something you can iterate over. Also, how to vectorize it (or whether it is possible) will depend on the "stuff" you are doing inside the loop. — BrenBarn
– BrenBarn, Commented Dec 28, 2014 at 0:00
Presumably for i in array.shape[0]: should be for i in range(array.shape[0]): (a mistake I've made more than once). — Warren Weckesser
– Warren Weckesser, Commented Dec 28, 2014 at 2:03
so please verify if I am on the right track with your code: 1) you are trying to slide through your array in the 3x3 sub-matrix fashion, 2) find the location of the max in this sub-matrix, and 3) append this value to a list until list is 100 elements long? So essentially you are doing a 2D convolution operation with a kernel that finds maximum of all values? — Pacific Stickler
– Pacific Stickler, Commented Dec 30, 2014 at 17:10
If a number is bigger than all of its neighbors, what happens? For example in 1D: if you had 1 3 2 4 3 5 ... You would start at 1, move to 3 and then? stay at 3? move to 2? If you move to 2 it then goes to 4. So this won't find the local maximum. — Joel
– Joel, Commented Jan 3, 2015 at 23:53

Community · Accepted Answer · 2017-05-23 10:33:40Z

For your question to make sense to me, I think you need to move where list = [] appears. Otherwise you'll never get to even i=0, j=1 until list is full. I can't imagine that it is slow as currently implemented --- list will be full very quickly, and then the for loops should be very fast. Here is what I believe you intended. Please clarify if this is not correct.

for i in range (0,array.shape[0]):
    for j in range (0,array.shape[1]):
         list = []
         while len(list) < 100:
                print "identity", i, j

                #find neighboring entry with greatest value (e.g., assume it is [i-1, j] with value 10)
                list.append((i-1,j))
                i = i-1
                j = j
         #perform operations on list

Let's do some modifications. I'll assume there is a function get_max_nbr(i,j) which returns the coordinates of the maximum neighbor. One of the places your code is slow is that it will call get_max_nbr for the same coordinate many times (at each step in the loop it does it 100 times). The code below uses memoization to get around this (down to 1 time on average). So if this is your bottleneck, this should get you close to 100 times speedup.

maxnbr = {}
for i in range(0,array.shape[0]):
    for j in range (0,array.shape[1]):
        list = []
        current_loc = (i,j)
        while len(list) < 100:
            if current_loc not in maxnbr:  #if this is our first time seeing current_loc
                maxnbr[current_loc] = get_max_nbr(*current_loc) #note func(*(i,j)) becomes func(i,j)
            current_loc = maxnbr[current_loc]
            list.append(current_loc)
        #process list here

This doesn't successfully vectorize, but it does create the list (I think) you want, and it should be a significant improvement. It may be that if we knew more about the list processing we might be able to find a better approach, but it's not clear.

Dschoni · Accepted Answer · 2015-01-05 15:10:11Z

So this is my parallell approach. First I create a lookup-table, where every pixel shows the coordinates of the nearest-neighbour maximum. The code runs in about 2 seconds for a 100*100 matrix on my intel i7 dual core cpu. So far, the code is not optimized, data handling inside the multiprocessing is a little weird and can be surely made more easy. Just let me know, if this is, what you want. So far, the code only adds the coordinates of the data-points to the list, if you need the values instead, change at the appropriate points or just parse the resulting lines[] list.

import numpy
import multiprocessing as mp
import time
start=time.time()
#Getting the number of CPUs present
num_cpu=mp.cpu_count()
#Creation of random data for testing
data=numpy.random.randint(1,30,size=(200,200))
x,y=data.shape
#Padding is introduced to cope with the border of the dataset.
#Change if you want other behaviour like wrapping, reflection etc.
def pad(data):
    '''Can be made faster, by using numpys pad function
    if present'''
    a=numpy.zeros((x+2,y+2))
    a[1:-1,1:-1]=data
    return a
data=pad(data)
#Kernel to get only the neighbours, change that if you only want diagonals or other shapes.
kernel=numpy.array([[1,1,1],[1,0,1],[1,1,1]])
result_list=[]  
#Setting up functions for Parallel Processing  
def log_result(result): 
    result_list.append(result) 
def max_getter(pixel):
    '''As this function is going to be used in a parallel processing environment,
    the data has to exist globally in order not to have to pickle it in the subprocess'''
    temp=data[pixel[0]-1:pixel[0]+2,pixel[1]-1:pixel[1]+2].copy()*kernel
    #Getting the submatrix without the central pixel
    compare=numpy.max(temp)==temp
    coords=numpy.nonzero(compare)
    if len(coords[0])!=1:
        coords=(coords[0][0],coords[1][0])
    #discards every maximum which is not the first. Change if you want.
    #Converting back to global coordinates
    return (pixel,(pixel[0]+(numpy.asscalar(coords[0])-1),pixel[1]+(numpy.asscalar(coords[1])-1)))
    #This assumes, that the maximum is unique in the subset, if this is not the case adjust here
def parallell_max():
    pool = mp.Pool() 
#You can experiment using more cores if you have hyperthreading and it's not correctly found by cpu_count
    for i in range(1,x+1):

        for j in range(1,y+1):

            pool.apply_async(max_getter, args = ((i,j),),callback=log_result) 
    pool.close()
    pool.join() 


#___________START Parallel Processing________
if __name__ == '__main__':
   # directions={}
    parallell_max()
    directions={}
    for i in result_list:
        directions[i[0]]=i[1]
    #Directions is a dictionary-like lookup-table, where every pixel gives the next pixel in the line
    lines=[]
    #The following code can also be easily parallelized as seen above.
    for i in range(1,x+1):
        for j in range(1,y+1):
            line=[]
            first,second=i,j
            for k in range(100):
                line.append((first,second))
                first,second=directions[(first,second)]
            lines.append(line)
    stop=time.time()
    print stop-start

Just a "bug" I didn't think of when writing this: I assumed, that only positive integers are used. If you use arbitrary numbers, or floats, some lines have to be adjusted. But the principle should work nevertheless.

Pacific Stickler · Accepted Answer · 2014-12-28 01:11:56Z

1

Very simply, numpy allows for element-wise operation on its arrays without having to loop over each of its dimensions.

So say you want to apply a simple operator on each element, e.g. scalar multiplication by a number 2, then you can do either of the following:

array*2

or

np.multiply( array,2)

Depending on the nature of stuff you are doing within your loop, you may adapt other techniques to do an elementwise operation using vectorization.

answered Dec 28, 2014 at 1:11

Pacific Stickler

1,1079 silver badges20 bronze badges

Comments

Roland Smith · Accepted Answer · 2014-12-28 01:57:59Z

1

Your first concern should be to see if you can carry out your calculations using numpy's element-wise operators.
If that doesn't work, look at the universal functions (ufuncs) built into numpy.

Both of these are coded in compiled C (or Fortran) and are much faster than looping in Python. Additionally, your code will be shorter and easier to understand.

Additional parameters that might improve performance are which compiler was used to compile numpy and which lineair algebra library is used (assuming your code uses linear algebra). E.g. the ATLAS are automatically tuned for the machine they are built on. Intel sells a Fortran compiler and math libraries that are supposed to be very fast on an Intel processor. IIRC, they also parallelize over all available cores.

If your math libraries don't use multiple cores automatically, using the multiprocessing module might be an option. Assuming the problem can be parallelized, this can reduce the runtime (almost) by a factor of 1/N where N is the number of cores. Minus of course the overhead necessary to distribute the problem and gather the results.

Alternatively, for problems that can be parallelized, you could use pyCUDA with numpy if you have a NVidia video card.

edited Dec 28, 2014 at 1:57

answered Dec 28, 2014 at 1:47

Roland Smith

43.8k3 gold badges69 silver badges98 bronze badges

1 Comment

loli Over a year ago

Considering that it's still not clear what @Sam actually wants, this is maybe the best answer there is. I would just add, since the problem is highly image and graph related, scipy.ndimage and the basic shortest path algorithms (e.g. A*) might help.

chthonicdaemon · Accepted Answer · 2015-01-03 09:31:45Z

1

If your goal is to find the local maxima in the array, you can use scipy.ndimage.filters.maximum_filter with a 3×3 window and then check for equality:

import numpy
import scipy
import scipy.ndimage

arr = numpy.array([[1, 1, 2, 8],
                   [5, 5, 4, 1],
                   [9, 5, 8, 8],
                   [7, 3, 6, 6]])
maxima = zip(*(scipy.ndimage.filters.maximum_filter(arr, 3) == arr).nonzero())

The speed of this will depend greatly on whether you really need to use only the first 100 and how many maxima there are. If so, breaking out early will probably be faster. Refining your question with the real meat of what you are doing will help us to get a better solution, though.

answered Jan 3, 2015 at 9:31

chthonicdaemon

19.9k2 gold badges55 silver badges70 bronze badges

1 Comment

Joel Over a year ago

"Refining your question with the real meat of what you are doing will help us to get a better solution, though." Let me echo this. You'll see that this solution is very different from the solution I gave. Depending on what your end goal is, this may or may not be a vastly better solution than mine. But we can't judge it.

Dschoni · Accepted Answer · 2015-01-03 17:17:42Z

1

Adding to the already good answers, here is the commented and fast version to get everything in a list:

import numpy as np
import scipy.ndimage as ndi

#Data generation
data=np.random.randint(100, size=(2000, 2000))
#Maximum extraction using a 3x3 kernel
b=ndi.filters.maximum_filter(data,3) 
#Getting the first 100 entries of b as a 1-D array
max_list=b.flatten()[0:99]

In my test, this code took about 0.2 seconds including the data generation on my Intel i7 CPU and about 3 seconds, when the size of the array is 20k*2k. Timing seems to be no issue here, as I run into memory issues before execution time goes up noticeably.

Nevertheless, you can subdivide the exact same approach into smaller subarrays for larger amounts of data. Keep in mind, that at some point the data handling will take more time than the computation itself.

answered Jan 3, 2015 at 17:17

Dschoni

3,9307 gold badges49 silver badges86 bronze badges

3 Comments

Joel Over a year ago

It's not at all clear to me that the list b you're creating is at all like the list in the original post. I'm not sure that it isn't what the OP was after, but I don't think it's going to be the same thing.

Dschoni Over a year ago

The OP mentioned searching for the neighboring element with the greatest value (aka the brightest pixel in a 3x3 submatrix if I got it right). Of course, especially in image processing, there are other definitions of "neighbour" as well as "maximum" but I admit, that without the OP's comment, this is searching in the dark. The first answer already shows how to access the indices of the maxima by comparing with the list. But as I guess it is an imaging problem, I nevertheless wanted to hint at timing issues compared to memory issues.

chthonicdaemon Over a year ago

Your max_list will contain only the values of the maxima, repeated many times. See my answer for finding the indices.

Collectives™ on Stack Overflow

Vectorize nested for loop Python

6 Answers 6

Comments

1 Comment

Comments

1 Comment

1 Comment

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

Comments

1 Comment

Comments

1 Comment

1 Comment

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related