Most suitable sorting algorithm

Question

I have to sort a large array of doubles of size 100000.

The point is that I do not want to sort the whole array but only find the largest 20000 elements in descending order.

Currently I am using selection sort. Any way to improve the performance?

Are you asking for something substantially different from taking the first 20,000 elements and sorting them into descending order? — Jerry Coffin
– Jerry Coffin, Commented Jan 31, 2012 at 4:59
You want to find the Top 20K items, or any run of 20K items ? — wildplasser
– wildplasser, Commented Jan 31, 2012 at 13:52
Must be the top 20K, nothing else really makes sense. Plus the asker mentioned somewhere below that they're running selection sort 20,000 times, and top 20K is what that'll give you. — Nick Barnes
– Nick Barnes, Commented Jan 31, 2012 at 22:30

rob mayoff · Accepted Answer · 2012-01-31 04:54:53Z

6

100,000 is not a very large array on most modern devices. Are you sure you can't just sort all of them using a standard library sorting function?

You can avoid a full sort by using a variation of heapsort. Normally in a heapsort you build a heap of the entire data set (100,000 elements in your case). Instead, only allow the heap to grow to 20,000 elements. Keep the largest element at the top of the heap. Once the heap is full (20,000 elements), you compare each subsequent element of the data set to the top of the heap. If the next data set element is larger than the top of the heap, just skip it. If it's smaller than the top of the heap, pop the top of the heap and insert the element from the data set.

Once you've gone through the entire data set, you have a heap of the 20,000 smallest elements of the data set. You can pop them one-by-one into an array to have a sorted array.

This algorithm runs in O(N log K) time, where N is the size of the data set (100,000 in your example) and K is the number of elements you want to keep (20,000 in your example).

answered Jan 31, 2012 at 4:54

rob mayoff

387k69 gold badges842 silver badges887 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Nick Barnes Over a year ago

I think you can improve this a bit. You're building the heap in O(NlogK) and retrieving the values in O(KlogK). Instead, you can build the entire 100,000-element heap in O(N), and just pull 20,000 values from it in O(KlogN). KlogN < NlogK, so you might come out in front (though the constants could easily change that).

rob mayoff Over a year ago

That could definitely be faster. Note that if you can't modify the original array (or you receive the data as a stream) you will need O(N) storage instead of O(K) storage.

mgibsonbr · Accepted Answer · 2012-01-31 04:50:26Z

1

I'd suggest starting with bucket sort and then using some of the simpler algorithms to sort each bucket. If any of them is still too big, you can either use bucket sort again or another nlog(n) method (such as mergesort or quicksort). Otherwise, selection (or better, insertion) will do just fine.

Just for comparison: selection/insertion/quicksort is O(n*n), mergesort is O(nlog(n)), bucket sort is O(n*k), where k is the number of buckets. Choose k < log(n) and you'll get a better performance than the alternatives.

Note: quicksort's worst case scenario is O(n*n), but in practice it is much faster.

Update O(n*k) is the average performance for bucket sort, not the worst case, so the same note above applies.

answered Jan 31, 2012 at 4:50

mgibsonbr

22.1k7 gold badges73 silver badges115 bronze badges

Comments

shift66 · Accepted Answer · 2012-01-31 05:00:17Z

1

If you use bubble sort algorithm and move to left smaller number, after 20.000th iteration there will be smallest numbers in the end of the array in descending order.
For example 3 7 2 5 1 4 8 array:
1 iteration: 7 3 5 2 4 8 1
2 iteration: 7 5 3 4 8 2 1
3 iteration: 7 5 4 8 3 2 1

After 3rd iteration there are 3 smallest elements in the end in descending order.
I recommend this because in this case complexity depends from number of elements you want to sort. And if you want to get small number of elements your program will work fast. Complexity is O(k*n) where k is number of elements you want to get.

edited Jan 31, 2012 at 5:00

answered Jan 31, 2012 at 4:47

shift66

12k13 gold badges54 silver badges83 bronze badges

2 Comments

shift66 Over a year ago

no, because he wants only some of the elements. Complexity is O(k*n) where k is number of elements he want to get. If he want to get only one element, this algorithm has O(n) complexity

mihsathe Over a year ago

But that is pretty much what I am doing. Running selection sort 20000 times.

Nick Barnes · Accepted Answer · 2012-01-31 13:19:10Z

1

You can get the first K sorted elements with a modified quicksort. The key is to realise that, once you've reordered your list around the pivot, you can forget about sorting the right-hand side if your pivot is ≥K.

In short, just replace the "right-hand" recursive call to quicksort() with

if (pivot >= k) quicksort(...)

Alternatively, you could follow the standard heapsort algorithm, but stop after pulling K elements from the heap.

Both of these approaches take O(N + KlogN) time, O(N) space, and can be done in-place.

answered Jan 31, 2012 at 13:19

Nick Barnes

21.8k5 gold badges63 silver badges70 bronze badges

Comments

Aqeel · Accepted Answer · 2012-01-31 04:45:40Z

0

you can improve by using Quick sort algorithm to improve its efficiency, or you can use merge sort that will do this in nlog(n) time. calculate boths running time and find which is suitable for your snario.

answered Jan 31, 2012 at 4:45

Aqeel

6895 gold badges20 silver badges42 bronze badges

4 Comments

mihsathe Over a year ago

I tried quick sort first. But quick sort takes more time than selection since I have to quicksort the whole array and not just 20000 elements

Aqeel Over a year ago

so check about MERGE SORT algorithm that will be better otherwise it is because of your element values are not suitable for nlog(n) running time

Nick Barnes Over a year ago

It's not a matter of how "suitable" the data is. The selection sort is faster here because, if you only need the first 20,000 elements, you can terminate it after 20,000 iterations. We're looking for a faster sort with the same property. Mergesort doesn't have it. Quicksort does, but you haven't explained how.

Aqeel Over a year ago

in quick sort algorithm first it calculates length of array then it do partition, so make length of array as you need to sort there

Collectives™ on Stack Overflow

Most suitable sorting algorithm

5 Answers 5

2 Comments

Comments

2 Comments

Comments

4 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

2 Comments

Comments

2 Comments

Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related