1

Say I have a list (or numpy.array) of (row, col) coordinates, e.g.:

[(0, 0), (1, 1), (0, 0)]

I'd like to build the 2x2 array like this:

2 0
0 1

where each of the listed coordinates is counted and put in the right place in the array. I.e. (0, 0) appears twice, so a[0, 0] == 2.

I know I can build this by iterating and poking the array for each element, but I wanted to check if there is any support in numpy regarding building the array like this, mostly for performance reasons. Can you point me in the right direction if so?

Also, is there a reduce-like functionality along the above lines? I.e. do new = f(acc, el) instead of new = acc + el.

1
  • Added some timing to my answer. I did not expect that. Commented Oct 3, 2018 at 22:42

2 Answers 2

3

Move to flat indexing and use np.bincount.

>>> import numpy as np                                                   
>>>                                                                                                                 
>>> coords = [(0, 0), (1, 1), (0, 0)]                                       
>>> 
>>> shp = np.max(coords, axis=0) + 1     
>>> flt = np.ravel_multi_index(np.moveaxis(coords, -1, 0), shp)               
>>> result = np.bincount(flt, minlength=shp.prod()).reshape(shp)                         
>>>                                                                                                                 
>>> result                                                                                                          
array([[2, 0],                                                                                                      
       [0, 1]])                                                                                                     

EDIT As pointed out by @MikeMiller moveaxis is overkill here; np.transpose(coords) or if coords happen to be an array coords.T is better. moveaxis would be more general if coords were for some reason more than 2D but that doesn't look like a likely scenario.

Sign up to request clarification or add additional context in comments.

3 Comments

Do you think np.transpose(coords) is slightly clearer than np.moveaxis(coords, -1, 0)?
@MikeMiller Thanks, I updated the post.
Thanks @PaulPanzer!
2

Using np.unique() to count the number of unique coordinates ( however, I don't know this is the fastest way or not, it is not, see the timings below):

import numpy as np

a = [(0,0), (1,1), (1,0), (0,0)]

b = np.array(a)
u, c = np.unique(b, axis=0, return_counts=True)
m = np.max(b)+1
ans = np.zeros((m, m))
ans[u[:,0], u[:,1]] = c

# ans
array([[ 2.,  0.],
       [ 1.,  1.]])

I did some timing:

# data preparation
max_coord = 10000
max_size = 100000

# this is awful, I know it can be done much better...
coords = [(int(np.random.randint(max_coord, size=1)),
           int(np.random.randint(max_coord, size=1))) for _ in range(max_size)]

# timings using %timeit

# my solution
139 ms ± 592 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

# Paul Panzer's solution
142 ms ± 461 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

# with max_size = 1000000
# my solution
827 ms ± 19.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

# Paul's solution
748 ms ± 4.62 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Almost the same (though I don't know about their memory footprint; with max_size=1000000 and max_coord=100000 both solutions give MemoryError on my machine). However, I'll go with the @Paul's solution, it is much more neat (and faster when data is big).

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.