1

I have the following nested for loop (randoms for simplicity):

import numpy as np 

lat_idx = np.random.randint(121, size = 4800)
lon_idx = np.random.randint(201, size = (4800,4800))
sum_cell = np.zeros((121,201))
data = np.random.rand(4800,4800)
for j in range(4800):
    for i in range(4800):
        if lat_idx[i] < 0 or lon_idx[i, j] < 0: 
            continue
        sum_cell[lat_idx[i], lon_idx[i, j]] += data[i, j]

#print(sum_cell)

Is there a way to write it as matrix operation or with some "numpy action"? At the moment it is really slow. My problem is that lon_idx is both dependent on i and j.

8
  • 1
    I suspect you'll have to use the unbuffered np.add.at function, since you'll be combining multiple d[i,j] elements into one sum_cell. Commented Nov 27, 2019 at 7:54
  • How can lat_idx[i] < 0 or lon_idx[i, j] < 0 be true, if you are using only positive numbers? Commented Nov 27, 2019 at 7:58
  • @Aryerez This line is a remnant from my original code where negative values are possible. Commented Nov 27, 2019 at 8:00
  • Is the indent correct here? Or why are you checking that if statement even if there is no action depending on that? Anyway I would suggest to cast lat_idx to the same dimension as lon_idx and then work with a mask for your condition. Commented Nov 27, 2019 at 9:31
  • 1
    @clearseplex I misread last part of the code, I apologize for the incorrect answer. Commented Nov 27, 2019 at 9:45

1 Answer 1

1

This is how you can do that in a vectorized way:

import numpy as np

# Make input data
np.random.seed(0)
data = np.random.rand(4800, 4800)
# Add some negative values in indices
lat_idx = np.random.randint(-20, 121, size=4800)
lon_idx = np.random.randint(-50, 201, size=(4800, 4800))
# Output array
sum_cell = np.zeros((121, 201))
# Make mask for positive indices
lat_idx2 = lat_idx[:, np.newaxis]
m = (lat_idx2 >= 0) & (lon_idx >= 0)
# Get positive indices
lat_pos, lon_pos = np.broadcast_to(lat_idx2, m.shape)[m], lon_idx[m]
# Add values
np.add.at(sum_cell, (lat_pos, lon_pos), data[m])
# Check result with previous method
sum_cell2 = np.zeros((121, 201))
for j in range(4800):
    for i in range(4800):
        if lat_idx[i] < 0 or lon_idx[i, j] < 0: 
            continue
        sum_cell2[lat_idx[i], lon_idx[i, j]] += data[i, j]
print(np.allclose(sum_cell, sum_cell2))
# True
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.