Compute distances between all points in array efficiently using Python

Question

I have a list of N=3 points like this as input: points = [[1, 1], [2, 2], [4, 4]]

I wrote this code to compute all possible distances between all elements of my list points, as dist = min(∣x1−x2∣,∣y1−y2∣):

distances = []
for i in range(N-1):
    for j in range(i+1,N):
        dist = min((abs(points[i][0]-points[j][0]), abs(points[i][1]-points[j][1])))
        distances.append(dist)
print(distances)

My output will be the array distances with all the distances saved in it: [1, 3, 2]

It works fine with N=3, but I would like to compute it in a more efficiently way and be free to set N=10^5. I am trying to use also numpy and scipy, but I am having a little trouble with replacing the loops and use the correct method.

Can anybody help me please? Thanks in advance

Do you know how many distances you will be calculating with a set of 10^5 points? Do you think that's a reasonable thing to try to calculate? — Mark
– Mark, Commented Aug 31, 2020 at 15:28
You can use cdist, or you can use simple broadcasting. But as @MarkMeyer says, your data is just too big for pair-wise distance. Basically 5*10^9 pairs, which is about 40GB of memory. — Quang Hoang
– Quang Hoang, Commented Aug 31, 2020 at 15:29
Hi! I know, it sounds crazy, but in this task (I am just doing some exercises to improve myself) csacademy.com/contest/archive/task/strange-distance/statement there's written that N could be 10^5. So, I though it was an idea (not necessarily a good one) to aim to it. Obviously, this is not the solution of the task, but just a part of it (the code in this question is not for k-th distance of course) — hellomynameisA
– hellomynameisA, Commented Aug 31, 2020 at 15:58
Ho can cdist improve my code? Shall I just substitute min((abs(points[i][0]-points[j][0]), abs(points[i][1]-points[j][1]))) with cdist(i, j) or do something else? — hellomynameisA
– hellomynameisA, Commented Aug 31, 2020 at 16:00

Valdi_Bo · Accepted Answer · 2020-08-31 18:10:26Z

The numpythonic solution

To compute your distances using the full power of Numpy, and do it substantially faster:

Convert your points to a Numpy array:
```
 pts = np.array(points)
```

Then run:

 dist = np.abs(pts[np.newaxis, :, :] - pts[:, np.newaxis, :]).min(axis=2)

Here the result is a square array. But if you want to get a list of elements above the diagonal, just like your code generates, you can run:

dist2 = dist[np.triu_indices(pts.shape[0], 1)].tolist()

I ran this code for the following 9 points:

points = [[1, 1], [2, 2], [4, 4], [3, 5], [2, 8], [4, 10], [3, 7], [2, 9], [4, 7]]

For the above data, the result saved in dist (a full array) is:

array([[0, 1, 3, 2, 1, 3, 2, 1, 3],
       [1, 0, 2, 1, 0, 2, 1, 0, 2],
       [3, 2, 0, 1, 2, 0, 1, 2, 0],
       [2, 1, 1, 0, 1, 1, 0, 1, 1],
       [1, 0, 2, 1, 0, 2, 1, 0, 1],
       [3, 2, 0, 1, 2, 0, 1, 1, 0],
       [2, 1, 1, 0, 1, 1, 0, 1, 0],
       [1, 0, 2, 1, 0, 1, 1, 0, 2],
       [3, 2, 0, 1, 1, 0, 0, 2, 0]])

and the list of elements from upper diagonal part is:

[1, 3, 2, 1, 3, 2, 1, 3, 2, 1, 0, 2, 1, 0, 2, 1, 2, 0, 1, 2, 0, 1, 1, 0, 1, 1,
  2, 1, 0, 1, 1, 1, 0, 1, 0, 2]

How faster is my code

It turns out that even for such small sample like I used (9 points), my code works 2 times faster. For a sample of 18 points (not presented here) - 6 times faster.

This difference in speed has been gained even though my function computes "2 times more than needed" i.e. it generates a full array, whereas the lower diagonal part of the result in a "mirror view" of the upper diagonal part (what computes your code).

For bigger number of points the difference should be much bigger. Make your test on a bigger sample of points (say 100 points) and write how many times faster was my code.

Your code and your explanation are fantastic!! Thank you very much for your help!!!

Collectives™ on Stack Overflow

Compute distances between all points in array efficiently using Python

1 Answer 1

The numpythonic solution

How faster is my code

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

The numpythonic solution

How faster is my code

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related