3

Part of my Python program contains the follow piece of code, where a new grid is calculated based on data found in the old grid.

The grid i a two-dimensional list of floats. The code uses three for-loops:

for t in xrange(0, t, step):
    for h in xrange(1, height-1):
        for w in xrange(1, width-1):
            new_gr[h][w] = gr[h][w] + gr[h][w-1] + gr[h-1][w] + t * gr[h+1][w-1]-2 * (gr[h][w-1] + t * gr[h-1][w])
    gr = new_gr

return gr

The code is extremly slow for a large grid and a large time t.

I've tried to use Numpy to speed up this code, by substituting the inner loop with:

J = np.arange(1, width-1)
new_gr[h][J] = gr[h][J] + gr[h][J-1] ...

But the results produced (the floats in the array) are about 10% smaller than their list-calculation counterparts.

  • What loss of accuracy is to be expected when converting lists of floats to Numpy array of floats using np.array(pylist) and then doing a calculation?

  • How should I go about converting a triple for-loop to pretty and fast Numpy code? (or are there other suggestions for speeding up the code significantly?)

12
  • Can you explain what those loops do? Commented Oct 24, 2015 at 16:29
  • With arrays you can (and should) index with gr[h, J]. Commented Oct 24, 2015 at 16:30
  • @hpaulj thanks, I understand this is supposed to be faster. Commented Oct 24, 2015 at 16:45
  • @hellpanderrr a new grid is calculated based on a long calculation where muliple values of the old grid is used. The grid is a part of my game engine. Commented Oct 24, 2015 at 16:47
  • Please explain what your given code is (and is supposed to be doing). It looks like you are applying a smoothing kernel to re-grid your data... please explain further. Commented Oct 24, 2015 at 16:47

2 Answers 2

4

If gr is a list of floats, the first step if you are looking to vectorize with NumPy would be to convert gr to a NumPy array with np.array().

Next up, I am assuming that you have new_gr initialized with zeros of shape (height,width). The calculations being performed in the two innermost loops basically represent 2D convolution. So, you can use signal.convolve2d with an appropriate kernel. To decide on the kernel, we need to look at the scaling factors and make a 3 x 3 kernel out of them and negate them to simulate the calculations we are doing with each iteration. Thus, you would have a vectorized solution with the two innermost loops being removed for better performance, like so -

import numpy as np
from scipy import signal

# Get the scaling factors and negate them to get kernel
kernel = -np.array([[0,1-2*t,0],[-1,1,0,],[t,0,0]])

# Initialize output array and run 2D convolution and set values into it
out = np.zeros((height,width))
out[1:-1,1:-1] = signal.convolve2d(gr, kernel, mode='same')[1:-1,:-2]

Verify output and runtime tests

Define functions :

def org_app(gr,t):
    new_gr = np.zeros((height,width))
    for h in xrange(1, height-1):
        for w in xrange(1, width-1):
            new_gr[h][w] = gr[h][w] + gr[h][w-1] + gr[h-1][w] + t * gr[h+1][w-1]-2 * (gr[h][w-1] + t * gr[h-1][w]) 
    return new_gr

def proposed_app(gr,t):
    kernel = -np.array([[0,1-2*t,0],[-1,1,0,],[t,0,0]])
    out = np.zeros((height,width))
    out[1:-1,1:-1] = signal.convolve2d(gr, kernel, mode='same')[1:-1,:-2]
    return out

Verify -

In [244]: # Inputs
     ...: gr = np.random.rand(40,50)
     ...: height,width = gr.shape
     ...: t = 1
     ...: 

In [245]: np.allclose(org_app(gr,t),proposed_app(gr,t))
Out[245]: True

Timings -

In [246]: # Inputs
     ...: gr = np.random.rand(400,500)
     ...: height,width = gr.shape
     ...: t = 1
     ...: 

In [247]: %timeit org_app(gr,t)
1 loops, best of 3: 2.13 s per loop

In [248]: %timeit proposed_app(gr,t)
10 loops, best of 3: 19.4 ms per loop
Sign up to request clarification or add additional context in comments.

1 Comment

Thank you, this was very helpful. @hpaulj also helped me understand this
2

@Divakar, I tried a couple of variations on your org_app. The fully vectorized version is:

def org_app4(gr,t):
    new_gr = np.zeros((height,width))
    I = np.arange(1,height-1)[:,None]
    J = np.arange(1,width-1)
    new_gr[I,J] = gr[I,J] + gr[I,J-1] + gr[I-1,J] + t * gr[I+1,J-1]-2 * (gr[I,J-1] + t * gr[I-1,J])
    return new_gr

While half the speed of your proposed_app, it is closer in style to the original. And thus may help with understanding how nested loops can be vectorized.

An important step is the conversion of I into a column array, so that together I,J index a block of values.

1 Comment

I like that broadcasted indices with one column array and another row array and also the fact that it's closer to original code.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.