Vectorising nested for loops with loop dependency in NumPy

Question

I have the following function, which generates a series of grid points on a tetrahedron.

def tet_grid(n):

    xv = np.array([
        [-1.,-1.,-1.],
        [ 1.,-1.,-1.],
        [-1., 1.,-1.],
        [-1.,-1., 1.],
        ])

    nsize = int((n+1)*(n+2)*(n+3)/6);
    xg = np.zeros((nsize,3))
    p = 0

    for i in range ( 0, n + 1 ):
        for j in range ( 0, n + 1 - i ):
            for k in range ( 0, n + 1 - i - j ):
                l = n - i - j - k
                xg[p,0]=(i * xv[0,0] + j * xv[1,0] + k * xv[2,0] + l * xv[3,0])/n 
                xg[p,1]=(i * xv[0,1] + j * xv[1,1] + k * xv[2,1] + l * xv[3,1])/n 
                xg[p,2]=(i * xv[0,2] + j * xv[1,2] + k * xv[2,2] + l * xv[3,2])/n 
                p = p + 1

    return xg

Is there an easy way to vectorise this in NumPy?

John Zwinck · Accepted Answer · 2017-05-21 11:51:53Z

The first simple thing you can do is use broadcasting to turn three calculations into one:

xg[p]=(i * xv[0] + j * xv[1] + k * xv[2] + l * xv[3])/n

The next is to note that the division by n can be moved to the very end:

return xg / n

Then, we can split apart the four multiplies and store the results separately, then combine them at the end. Now we have:

xg = np.empty((nsize,4)) # n.b. zeros not required
p = 0

for i in range ( 0, n + 1 ):
    for j in range ( 0, n + 1 - i ):
        for k in range ( 0, n + 1 - i - j ):
            l = n - i - j - k
            xg[p,0] = i
            xg[p,1] = j
            xg[p,2] = k
            xg[p,3] = l
            p = p + 1

return (xg[:,:,None] * xv).sum(1) / n

The trick of xg[:,:,None] at the bottom is to broadcast (nsize,n) * (n,3) - we expand the (nsize,n) xg to (nsize,n,3) because i,j,k,l do not depend on which column of xv is being multiplied with.

We can skip computing l in the loop and instead do it all at once right before the return:

xg[:,3] = n - xg[:,0:3].sum(1)

Now all you need to do is figure out how to create i,j,k in a vectorized way according to p.

As a general note, I find it easiest to work these problems from the "inside out," looking at the code in the innermost loop and pushing as much as possible outside of as many loops as possible. Do this over and over, and eventually there will be no loops.

Andras Deak -- Слава Україні · Accepted Answer · 2017-05-22 12:49:05Z

1

You can get rid of your dependent nested loop, but at a cost of waste in memory (more so than usual in vectorization problems). You're addressing a small subset of a 3d box in your for loop. If you don't mind generating (n+1)^3 items for using only (n+1)(n+2)(n+3)/6 ones, and you can fit into memory, the vectorized version will indeed be likely much faster.

My suggestion, explanation below:

import numpy as np

def tet_vect(n):

    xv = np.array([
        [-1.,-1.,-1.],
        [ 1.,-1.,-1.],
        [-1., 1.,-1.],
        [-1.,-1., 1.],
        ])

    # spanning arrays of a 3d grid according to range(0,n+1)
    ii,jj,kk = np.ogrid[:n+1,:n+1,:n+1]
    # indices of the triples which fall inside the original for loop
    inds = (jj < n+1-ii) & (kk < n+1-ii-jj)
    # the [i,j,k] indices of the points that fall inside the for loop, in the same order
    combs = np.vstack(np.where(inds)).T
    # combs is now an (nsize,3)-shaped array

    # compute "l" column too
    lcol = n - combs.sum(axis=1)
    combs = np.hstack((combs,lcol[:,None]))
    # combs is now an (nsize,4)-shaped array

    # all we need to do now is to take the matrix product of combs and xv, divide by n in the end
    xg = np.matmul(combs,xv)/n

    return xg

The key step is generating the ii,jj,kk indices which span a 3d grid. This step is memory-efficient, since np.ogrid creates spanning arrays that can be used in broadcasting operations to refer to your full grid. We only generate a full (n+1)^3-sized array in the next step: the inds boolean array selects those points in the 3d box that lie inside your region of interest (and it does so by making use of array broadcasting). In the following step np.where(inds) picks out the truthy elements of this large array, and we end up with the smaller number of nsize elements. The single memory-wasting step is thus the creation of inds.

The rest is straightforward: we need to generate an additional column for the array that contains the [i,j,k] indices in each row, this can be done by summing over the columns of the array (again a vectorized operation). Once we have the (nsize,4)-shaped auxiliary array that contains each (i,j,k,l) in its rows: we need to perform a matrix multiplication of this object with xv, and we're done.

Testing with small n sizes suggests that the above function produces the same results as yours. Timing for n = 100: 1.15 s original, 19 ms vectorized.

answered May 22, 2017 at 12:49

Andras Deak -- Слава Україні

35.4k13 gold badges94 silver badges118 bronze badges

3 Comments

Divakar Over a year ago

Those dependent ones are fun to vectorize!

Andras Deak -- Слава Україні Over a year ago

@Divakar indeed! I'm still waiting for your magical solution that doesn't waste 5/6ths of necessary memory :)

Divakar Over a year ago

Seems like you are already doing what I would do. We just need to wait for better hardware that's memory optimized.

Collectives™ on Stack Overflow

Vectorising nested for loops with loop dependency in NumPy

2 Answers 2

Comments

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related