Fastest way to build numpy array from sum of coordinates [duplicate]

Question

Say I have a list (or numpy.array) of (row, col) coordinates, e.g.:

[(0, 0), (1, 1), (0, 0)]

I'd like to build the 2x2 array like this:

2 0
0 1

where each of the listed coordinates is counted and put in the right place in the array. I.e. (0, 0) appears twice, so a[0, 0] == 2.

I know I can build this by iterating and poking the array for each element, but I wanted to check if there is any support in numpy regarding building the array like this, mostly for performance reasons. Can you point me in the right direction if so?

Also, is there a reduce-like functionality along the above lines? I.e. do new = f(acc, el) instead of new = acc + el.

Added some timing to my answer. I did not expect that.

today
– today

2018-10-03 22:42:15 +00:00
Commented Oct 3, 2018 at 22:42 — today
– today, Commented Oct 3, 2018 at 22:42

Paul Panzer · Accepted Answer · 2018-10-03 23:18:11Z

3

Move to flat indexing and use np.bincount.

>>> import numpy as np                                                   
>>>                                                                                                                 
>>> coords = [(0, 0), (1, 1), (0, 0)]                                       
>>> 
>>> shp = np.max(coords, axis=0) + 1     
>>> flt = np.ravel_multi_index(np.moveaxis(coords, -1, 0), shp)               
>>> result = np.bincount(flt, minlength=shp.prod()).reshape(shp)                         
>>>                                                                                                                 
>>> result                                                                                                          
array([[2, 0],                                                                                                      
       [0, 1]])

EDIT As pointed out by @MikeMiller moveaxis is overkill here; np.transpose(coords) or if coords happen to be an array coords.T is better. moveaxis would be more general if coords were for some reason more than 2D but that doesn't look like a likely scenario.

edited Oct 3, 2018 at 23:18

answered Oct 3, 2018 at 22:11

Paul Panzer

53.3k3 gold badges59 silver badges103 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Michael Miller Over a year ago

Do you think np.transpose(coords) is slightly clearer than np.moveaxis(coords, -1, 0)?

Paul Panzer Over a year ago

@MikeMiller Thanks, I updated the post.

levant pied Over a year ago

Thanks @PaulPanzer!

today · Accepted Answer · 2018-10-03 22:56:28Z

Using np.unique() to count the number of unique coordinates ( ~~however, I don't know this is the fastest way or not~~, it is not, see the timings below):

import numpy as np

a = [(0,0), (1,1), (1,0), (0,0)]

b = np.array(a)
u, c = np.unique(b, axis=0, return_counts=True)
m = np.max(b)+1
ans = np.zeros((m, m))
ans[u[:,0], u[:,1]] = c

# ans
array([[ 2.,  0.],
       [ 1.,  1.]])

I did some timing:

# data preparation
max_coord = 10000
max_size = 100000

# this is awful, I know it can be done much better...
coords = [(int(np.random.randint(max_coord, size=1)),
           int(np.random.randint(max_coord, size=1))) for _ in range(max_size)]

# timings using %timeit

# my solution
139 ms ± 592 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

# Paul Panzer's solution
142 ms ± 461 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

# with max_size = 1000000
# my solution
827 ms ± 19.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

# Paul's solution
748 ms ± 4.62 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Almost the same (though I don't know about their memory footprint; with max_size=1000000 and max_coord=100000 both solutions give MemoryError on my machine). However, I'll go with the @Paul's solution, it is much more neat (and faster when data is big).

Collectives™ on Stack Overflow

Fastest way to build numpy array from sum of coordinates [duplicate]

2 Answers 2

3 Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Linked

Related