Converting a nested loop calculation to Numpy for speedup

Question

Part of my Python program contains the follow piece of code, where a new grid is calculated based on data found in the old grid.

The grid i a two-dimensional list of floats. The code uses three for-loops:

for t in xrange(0, t, step):
    for h in xrange(1, height-1):
        for w in xrange(1, width-1):
            new_gr[h][w] = gr[h][w] + gr[h][w-1] + gr[h-1][w] + t * gr[h+1][w-1]-2 * (gr[h][w-1] + t * gr[h-1][w])
    gr = new_gr

return gr

The code is extremly slow for a large grid and a large time t.

I've tried to use Numpy to speed up this code, by substituting the inner loop with:

J = np.arange(1, width-1)
new_gr[h][J] = gr[h][J] + gr[h][J-1] ...

But the results produced (the floats in the array) are about 10% smaller than their list-calculation counterparts.

What loss of accuracy is to be expected when converting lists of floats to Numpy array of floats using np.array(pylist) and then doing a calculation?
How should I go about converting a triple for-loop to pretty and fast Numpy code? (or are there other suggestions for speeding up the code significantly?)

@hellpanderrr a new grid is calculated based on a long calculation where muliple values of the old grid is used. The grid is a part of my game engine. — darthur
– darthur, Commented Oct 24, 2015 at 16:47
Please explain what your given code is (and is supposed to be doing). It looks like you are applying a smoothing kernel to re-grid your data... please explain further. — DilithiumMatrix
– DilithiumMatrix, Commented Oct 24, 2015 at 16:47

Divakar · Accepted Answer · 2015-10-24 19:39:20Z

If gr is a list of floats, the first step if you are looking to vectorize with NumPy would be to convert gr to a NumPy array with np.array().

Next up, I am assuming that you have new_gr initialized with zeros of shape (height,width). The calculations being performed in the two innermost loops basically represent 2D convolution. So, you can use signal.convolve2d with an appropriate kernel. To decide on the kernel, we need to look at the scaling factors and make a 3 x 3 kernel out of them and negate them to simulate the calculations we are doing with each iteration. Thus, you would have a vectorized solution with the two innermost loops being removed for better performance, like so -

import numpy as np
from scipy import signal

# Get the scaling factors and negate them to get kernel
kernel = -np.array([[0,1-2*t,0],[-1,1,0,],[t,0,0]])

# Initialize output array and run 2D convolution and set values into it
out = np.zeros((height,width))
out[1:-1,1:-1] = signal.convolve2d(gr, kernel, mode='same')[1:-1,:-2]

Verify output and runtime tests

Define functions :

def org_app(gr,t):
    new_gr = np.zeros((height,width))
    for h in xrange(1, height-1):
        for w in xrange(1, width-1):
            new_gr[h][w] = gr[h][w] + gr[h][w-1] + gr[h-1][w] + t * gr[h+1][w-1]-2 * (gr[h][w-1] + t * gr[h-1][w]) 
    return new_gr

def proposed_app(gr,t):
    kernel = -np.array([[0,1-2*t,0],[-1,1,0,],[t,0,0]])
    out = np.zeros((height,width))
    out[1:-1,1:-1] = signal.convolve2d(gr, kernel, mode='same')[1:-1,:-2]
    return out

Verify -

In [244]: # Inputs
     ...: gr = np.random.rand(40,50)
     ...: height,width = gr.shape
     ...: t = 1
     ...: 

In [245]: np.allclose(org_app(gr,t),proposed_app(gr,t))
Out[245]: True

Timings -

In [246]: # Inputs
     ...: gr = np.random.rand(400,500)
     ...: height,width = gr.shape
     ...: t = 1
     ...: 

In [247]: %timeit org_app(gr,t)
1 loops, best of 3: 2.13 s per loop

In [248]: %timeit proposed_app(gr,t)
10 loops, best of 3: 19.4 ms per loop

Thank you, this was very helpful. @hpaulj also helped me understand this

hpaulj · Accepted Answer · 2015-10-24 20:11:27Z

2

@Divakar, I tried a couple of variations on your org_app. The fully vectorized version is:

def org_app4(gr,t):
    new_gr = np.zeros((height,width))
    I = np.arange(1,height-1)[:,None]
    J = np.arange(1,width-1)
    new_gr[I,J] = gr[I,J] + gr[I,J-1] + gr[I-1,J] + t * gr[I+1,J-1]-2 * (gr[I,J-1] + t * gr[I-1,J])
    return new_gr

While half the speed of your proposed_app, it is closer in style to the original. And thus may help with understanding how nested loops can be vectorized.

An important step is the conversion of I into a column array, so that together I,J index a block of values.

answered Oct 24, 2015 at 20:11

hpaulj

233k14 gold badges260 silver badges392 bronze badges

1 Comment

Divakar Over a year ago

I like that broadcasted indices with one column array and another row array and also the fact that it's closer to original code.

Collectives™ on Stack Overflow

Converting a nested loop calculation to Numpy for speedup

2 Answers 2

1 Comment

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related