Vectorization of nested for loop in Python

Question

I have the following nested for loop (randoms for simplicity):

import numpy as np 

lat_idx = np.random.randint(121, size = 4800)
lon_idx = np.random.randint(201, size = (4800,4800))
sum_cell = np.zeros((121,201))
data = np.random.rand(4800,4800)
for j in range(4800):
    for i in range(4800):
        if lat_idx[i] < 0 or lon_idx[i, j] < 0: 
            continue
        sum_cell[lat_idx[i], lon_idx[i, j]] += data[i, j]

#print(sum_cell)

Is there a way to write it as matrix operation or with some "numpy action"? At the moment it is really slow. My problem is that lon_idx is both dependent on i and j.

I suspect you'll have to use the unbuffered np.add.at function, since you'll be combining multiple d[i,j] elements into one sum_cell. — hpaulj
– hpaulj, Commented Nov 27, 2019 at 7:54
How can lat_idx[i] < 0 or lon_idx[i, j] < 0 be true, if you are using only positive numbers? — Aryerez
– Aryerez, Commented Nov 27, 2019 at 7:58
@Aryerez This line is a remnant from my original code where negative values are possible. — clearseplex
– clearseplex, Commented Nov 27, 2019 at 8:00
Is the indent correct here? Or why are you checking that if statement even if there is no action depending on that? Anyway I would suggest to cast lat_idx to the same dimension as lon_idx and then work with a mask for your condition. — some_name.py
– some_name.py, Commented Nov 27, 2019 at 9:31
@clearseplex I misread last part of the code, I apologize for the incorrect answer. — FBruzzesi
– FBruzzesi, Commented Nov 27, 2019 at 9:45

javidcf · Accepted Answer · 2019-11-27 11:20:24Z

This is how you can do that in a vectorized way:

import numpy as np

# Make input data
np.random.seed(0)
data = np.random.rand(4800, 4800)
# Add some negative values in indices
lat_idx = np.random.randint(-20, 121, size=4800)
lon_idx = np.random.randint(-50, 201, size=(4800, 4800))
# Output array
sum_cell = np.zeros((121, 201))
# Make mask for positive indices
lat_idx2 = lat_idx[:, np.newaxis]
m = (lat_idx2 >= 0) & (lon_idx >= 0)
# Get positive indices
lat_pos, lon_pos = np.broadcast_to(lat_idx2, m.shape)[m], lon_idx[m]
# Add values
np.add.at(sum_cell, (lat_pos, lon_pos), data[m])
# Check result with previous method
sum_cell2 = np.zeros((121, 201))
for j in range(4800):
    for i in range(4800):
        if lat_idx[i] < 0 or lon_idx[i, j] < 0: 
            continue
        sum_cell2[lat_idx[i], lon_idx[i, j]] += data[i, j]
print(np.allclose(sum_cell, sum_cell2))
# True

Collectives™ on Stack Overflow

Vectorization of nested for loop in Python

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related