4

Assuming there is a 2D-array and it is divided into several sub-regions as regions. there is another array filled with values. I would like to aggregate the values by sub-regions. The following code is my solution.

But, when the number of sub-regions is very large, the iteration costs much time. I would like to ask if there is any way to accelerate the programm? I suppose maybe numpy could do this, but I don't know how to do.

import numpy as np

regions = np.array([[0,0,0,1],
                    [0,0,1,1],
                    [1,1,1,2],
                    [2,2,2,2]], dtype=np.int32)
value_array = np.array([[9,5,8,4],
                        [6,4,8,5],
                        [4,5,9,7],
                        [4,7,3,0]], dtype=np.float32)

aggre_array = np.zeros_like(value_array)
for r in range(regions.max()+1):
    region = regions==r
    aggre_array[region] = value_array[region].mean()
print(aggre_array)
'''output
[[6.4       6.4       6.4       5.8333335]
 [6.4       6.4       5.8333335 5.8333335]
 [5.8333335 5.8333335 5.8333335 4.2      ]
 [4.2       4.2       4.2       4.2      ]]
'''

1 Answer 1

1

In such kind of grouping you need to work with flattened variants of arrays sorted by indices that sorts first array, like:

regions_sort = [0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2]
value_array_sort = [9, 5, 8, 6, 4, 4, 8, 5, 4, 5, 9, 7, 4, 7, 3, 0]

The latter part is to find indices that separates individual groups and apply it for further calcutions of groups sums, counts and means:

marker_idx = [5, 11]
group_counts = [5, 6, 5]
group_sums = [32, 35, 21]
group_means = [6.4, 5.83333, 4.2]

Finally, repeat values so that it fits value_array, rearrange them in inverse order and reshape it to initial shape.

sorter = np.argsort(regions.ravel())
_, inverse_sorter = np.unique(sorter, return_index=True) #could be optimised...
regions_sort = regions.ravel()[sorter]
value_array_sort = value_array.ravel()[sorter]

marker_idx = np.flatnonzero(np.diff(regions_sort))+1
reduceat_idx = np.r_[0, marker_idx]
group_counts = np.diff(marker_idx, prepend=0, append=regions.size) #could also use np.bincount...
group_sums = np.add.reduceat(regions_sort, reduceat_idx)
group_means = group_sums / group_counts

new_values = np.repeat(group_means, group_counts)
new_value_array = new_values[inverse_sorter].reshape(value_array.shape)

>>> new_value_array    
array([[6.4       , 6.4       , 6.4       , 5.83333333],
       [6.4       , 6.4       , 5.83333333, 5.83333333],
       [5.83333333, 5.83333333, 5.83333333, 4.2       ],
       [4.2       , 4.2       , 4.2       , 4.2       ]])

I've also found a way to do it numpy_indexed package designed for solving grouping problems in efficient ways:

import numpy_indexed as npi
groupby = npi.group_by(regions.ravel())
keys, values = groupby.mean(value_array.ravel())
>>> values[groupby.inverse].reshape(regions.shape)

array([[6.4       , 6.4       , 6.4       , 5.83333333],
       [6.4       , 6.4       , 5.83333333, 5.83333333],
       [5.83333333, 5.83333333, 5.83333333, 4.2       ],
       [4.2       , 4.2       , 4.2       , 4.2       ]])
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.