1

I have a list of N=3 points like this as input: points = [[1, 1], [2, 2], [4, 4]]

I wrote this code to compute all possible distances between all elements of my list points, as dist = min(∣x1−x2∣,∣y1−y2∣):

distances = []
for i in range(N-1):
    for j in range(i+1,N):
        dist = min((abs(points[i][0]-points[j][0]), abs(points[i][1]-points[j][1])))
        distances.append(dist)
print(distances)

My output will be the array distances with all the distances saved in it: [1, 3, 2]

It works fine with N=3, but I would like to compute it in a more efficiently way and be free to set N=10^5. I am trying to use also numpy and scipy, but I am having a little trouble with replacing the loops and use the correct method.

Can anybody help me please? Thanks in advance

4
  • 4
    Do you know how many distances you will be calculating with a set of 10^5 points? Do you think that's a reasonable thing to try to calculate? Commented Aug 31, 2020 at 15:28
  • 2
    You can use cdist, or you can use simple broadcasting. But as @MarkMeyer says, your data is just too big for pair-wise distance. Basically 5*10^9 pairs, which is about 40GB of memory. Commented Aug 31, 2020 at 15:29
  • Hi! I know, it sounds crazy, but in this task (I am just doing some exercises to improve myself) csacademy.com/contest/archive/task/strange-distance/statement there's written that N could be 10^5. So, I though it was an idea (not necessarily a good one) to aim to it. Obviously, this is not the solution of the task, but just a part of it (the code in this question is not for k-th distance of course) Commented Aug 31, 2020 at 15:58
  • Ho can cdist improve my code? Shall I just substitute min((abs(points[i][0]-points[j][0]), abs(points[i][1]-points[j][1]))) with cdist(i, j) or do something else? Commented Aug 31, 2020 at 16:00

1 Answer 1

4

The numpythonic solution

To compute your distances using the full power of Numpy, and do it substantially faster:

  1. Convert your points to a Numpy array:

     pts = np.array(points)
    
  2. Then run:

     dist = np.abs(pts[np.newaxis, :, :] - pts[:, np.newaxis, :]).min(axis=2)
    

Here the result is a square array. But if you want to get a list of elements above the diagonal, just like your code generates, you can run:

dist2 = dist[np.triu_indices(pts.shape[0], 1)].tolist()

I ran this code for the following 9 points:

points = [[1, 1], [2, 2], [4, 4], [3, 5], [2, 8], [4, 10], [3, 7], [2, 9], [4, 7]]

For the above data, the result saved in dist (a full array) is:

array([[0, 1, 3, 2, 1, 3, 2, 1, 3],
       [1, 0, 2, 1, 0, 2, 1, 0, 2],
       [3, 2, 0, 1, 2, 0, 1, 2, 0],
       [2, 1, 1, 0, 1, 1, 0, 1, 1],
       [1, 0, 2, 1, 0, 2, 1, 0, 1],
       [3, 2, 0, 1, 2, 0, 1, 1, 0],
       [2, 1, 1, 0, 1, 1, 0, 1, 0],
       [1, 0, 2, 1, 0, 1, 1, 0, 2],
       [3, 2, 0, 1, 1, 0, 0, 2, 0]])

and the list of elements from upper diagonal part is:

[1, 3, 2, 1, 3, 2, 1, 3, 2, 1, 0, 2, 1, 0, 2, 1, 2, 0, 1, 2, 0, 1, 1, 0, 1, 1,
  2, 1, 0, 1, 1, 1, 0, 1, 0, 2]

How faster is my code

It turns out that even for such small sample like I used (9 points), my code works 2 times faster. For a sample of 18 points (not presented here) - 6 times faster.

This difference in speed has been gained even though my function computes "2 times more than needed" i.e. it generates a full array, whereas the lower diagonal part of the result in a "mirror view" of the upper diagonal part (what computes your code).

For bigger number of points the difference should be much bigger. Make your test on a bigger sample of points (say 100 points) and write how many times faster was my code.

Sign up to request clarification or add additional context in comments.

1 Comment

Your code and your explanation are fantastic!! Thank you very much for your help!!!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.