Efficient repeated sorting

Question

I have a set of N points on a graph defined by (x,y) coordinates, as well as a table with their pairwise distance. I'm looking to generate a table with their relative "closeness ranking", e.g. if closeness[5][9] == 4, then node 9 is the fourth closest item relative to node 5.

The obvious way of doing this is by generating a list of indeces and sorting them based on d[i][j] < d[i][k] for every i (1->n), then transforming the table by the knowledge that sorted[5][4] == 9 implies closeness[5][9] == 4.

This would require O(n² log n) time. I feel like there could be a more efficient way. Any ideas?

Just looking through the table is O(n^2), so O(n^2 log n) doesn't seem that bad. Are the distances euclidean? — Vaughn Cato
– Vaughn Cato, Commented Dec 12, 2012 at 14:58
This might be significantly improved if you say you are not interested in all the distance orders, but just the first few closest points to each point. — Boris Strandjev
– Boris Strandjev, Commented Dec 13, 2012 at 9:41
@VaughnCato The additional log n does hurt performance quite a bit. The distances are euclidian, but suggestions for the general case are welcome too. — Gaminic
– Gaminic, Commented Dec 14, 2012 at 15:17
@BorisStrandjev I can think of a "roughly" O(n²) solution for a tiny, fixed amount of points, but I still need a significant portion. Think of 'n' in the range 1k-20k and I'd want to be able to handle the ~10% top nodes. Stricter limits can be worked with if there's a very elegant solution. — Gaminic
– Gaminic, Commented Dec 14, 2012 at 15:20

HeavenAgain · Accepted Answer · 2012-12-16 04:31:09Z

1

Okay, I'm going to try and take a stab at this.

For the background knowledge: This problem is somewhat related to k-nearest neighbor. I'm not sure how you generated your pairwise distance, but k-d tree is pretty good at solving these type of problem.

Now, even if you use k-d tree, it will help a bit (you only query what you need, instead of sorting ALL the points): building the tree in O(N log N) time, then for the K nearest points you want to query, each would take O(log N) time. In the end, you are looking at O(N log N) + O(NK log N).

Okay, now, the actual heuristic part. This will depend on your data, you might want to see if they are close together or far apart. But, you can try a divide and conquer approach, where you divide the plane into bins. When you need to find the closest points, find out which bin the point you are working with belong to, then you only work with neighboring bins, and explore more neighboring bins as you need more points.

Hopefully this helps, good luck.

answered Dec 16, 2012 at 4:31

HeavenAgain

4351 gold badge6 silver badges14 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Gaminic Over a year ago

Thanks for the input. The k-d tree is unlikely to provide much improvement, as I do need a reasonable number for K and in terms of practical efficiency I think sorting will have it beat, especially since the different sorts are independent and thus easily parallelized.

Gaminic Over a year ago

On the heuristic part: I did have a type of binning/pigeon-holing solution in mind, especially since a categorical/clustered sort would be sufficient for my application. However, my tests have trouble getting consistent and accurate results on certain point-distributions (e.g. highly clustered vs randomly scattered). Any ideas how to refine the idea?

Collectives™ on Stack Overflow

Efficient repeated sorting

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related