Efficient Sorting?

Question

I've been looking an answer for this question for sometime now... "What is the most efficient way to sort a million 32-bit integers?"

I feel that Quick sort is the most efficient in sorting.. with an average Runtime of O(n*log n). (with the worst case O(n²))..

But some search results says that Radix sort/Merge sort are efficient for sorting million integers.

Any pointers?

There are many unanswered questions here that are important. For instance, is the input data partially sorted already? Or is it totally random? What's the distribution of the data, etc? Quicksort is wellknown more for being reasonanly decent most of the time, and rarely really bad, more-so than being the ultimate sorting algorithmn. There are certainly other contenders, for instance timsort, used in python: en.wikipedia.org/wiki/Timsort — Tyler Eaves
– Tyler Eaves, Commented Feb 21, 2011 at 19:17
If there was a single simple answer to this question, I think many of us would not have jobs. — tenfour
– tenfour, Commented Feb 21, 2011 at 19:38
What I find interesting is, what is the overall impact of sorting on your actual problem? I've seen developpers spending days on optimizing special algorithms like this, ignoring the fact that the sorting just contributed a non-significant amount to the actual runtime of the system whereas in other areas changes could be made to circumvent the real bottlenecks within minutes. Or is this just a theoretical question? — Axel
– Axel, Commented Feb 21, 2011 at 19:44

corsiKa · Accepted Answer · 2011-02-21 19:16:28Z

3

Mergesort is guarenteed to be O(n lg n), but has a higher memory footprint than quick sort.

Quicksort generally runs faster than mergesort, but under ~some~ circumstances it can degrade to quadratic running time.

Radix sort is O(n*r) where r is the length of the numbers.

To determine if radix is better than your chosen lg-n method, do this:

n * r < n * lg (n)
divide by n on both sides
r < lg(n)

We know r is 32 bits

32 < lg(n)

for both sides, take 2^x

2^32 < 2^(lg(n)

2^32 < n

So if n is less than 2^32 (4 billion) then use the lg-n algorithm.

Personally, I'd use quick sort, shuffling it if I have to in order to prevent it from degrading.

answered Feb 21, 2011 at 19:16

corsiKa

82.8k26 gold badges162 silver badges210 bronze badges

Sign up to request clarification or add additional context in comments.

12 Comments

Axel Over a year ago

Ever thought about the meaning of the O(...) notation? It's not as simple as that.

corsiKa Over a year ago

@Yochai Sorting by digits is radix in base 10. Sorting by bits (which doesn't require the use of buckets) is done in base 2. You could use whatever base you want for it. If you can use base 10, why not just use base 2^32? You can take "32 bits" down to "8 digits" (of hex) if you want, or for that matter, if you want 256 buckets, you can take it down to 4... but you incur additional costs.

corsiKa Over a year ago

@Axel You have two parts to your comment, and I feel both are in need of explanation for them to be anything but flame bait.

Yochai Timmer Over a year ago

You should know the actual computation if you're concerned by performance . if you get kdN iterations it's a big difference if the limit is 1000 iterations or 10000 iterations. big O is just computational complexity. 100000 iteration * N is still big O(N).

Axel Over a year ago

@glowcoder But you completely ignored those constants in your answer and recommend "So if n is less than 2^32 (4 billion) then use the lg-n algorithm". This just won't work, because the threshold might as well be any other value, which is impossible to tell based on what is given in the OPs question.

|

Heisenbug · Accepted Answer · 2011-02-21 19:15:48Z

1

if you have enough space maybe you can try bucket sort (http://en.wikipedia.org/wiki/Bucket_sort). It's more efficient but requires additional memory to store datas.

answered Feb 21, 2011 at 19:15

Heisenbug

39.3k30 gold badges139 silver badges192 bronze badges

1 Comment

Yochai Timmer Over a year ago

They are equivalent. Bucket sort is a kind of Radix sort.

BadFileMagic · Accepted Answer · 2011-02-21 19:18:47Z

0

Merge sort is O(n log n) in its worst case, so its going to be better than quick sort most of the time. Radix sorts, iirc, are only really useful when each thing being sorted is the same length. Its timed in O(K * N) which is (item length) * (number of items). I don't think I've ever needed to use a Radix sort.

answered Feb 21, 2011 at 19:18

BadFileMagic

7013 silver badges7 bronze badges

4 Comments

corsiKa Over a year ago

"Better than quicksort most of the time" [citation needed].

BadFileMagic Over a year ago

what citation needed? if your worst case is as good as the best case for another algorithm, that's all the citation you need. It's just plain math. But if you want to be all wiki about it, en.wikipedia.org/wiki/Merge_sort will tell you that merge sort's worst case does 39% fewer comparisons than quicksort's average case, so even in the average case, you're still better off with a merge sort than a quick sort.

R. Martinho Fernandes Over a year ago

but that same wikipedia article says "On typical modern architectures, efficient quicksort implementations generally outperform mergesort for sorting RAM-based arrays.", so you're still better of with... wait what?

BadFileMagic Over a year ago

Martinho -- The efficiency of an algorithm, and the one you choose, is also going to depend on your data structure. For instance, if you're storing data in a hash table then you won't often need to sort since you're almost always going to have a constant access time of O(1) to do your lookup and sorting isn't necessary. Quicksort is easy enough to implement and performs fairly well, and is usually 'good enough,' but without more information from the OP, and based solely on the math, merge sort is better without having to rely on compiler optimizations or specific architectures.

Yochai Timmer · Accepted Answer · 2011-02-21 20:37:08Z

0

Radix is better if for large numbers, especially when you know the range of the numbers.

Calculation fix:

Radix is O(kN) where k is the number of digits in the largest number. (Actually it's about d*k*N where d is the digit base, number of buckets that will be used ... Alphabet = 26, decimal = 10 , binary = 2)

Maxint = 4,294,967,296
32 bits: k = 32 / log(d)

Base 10 Radix:

d*k*n = 10*10*n < nlogn .... 100 < logn ... n > 2^100

Base 2 Radix:

d*k*n = 2*32*n < nlogn .... 64 < logn ... n > 2^64

So for 32bit numbers if you have more than 2^64 numbers n*k*N is better than nlogn

But, if you know that the range will be up to 1024 , and not MAXINT for example:

MaxNumber = 1024

Base 10 Radix:

d*k*n = 10*4*n < nlogn .... 40 < logn ... n > 2^40

Base 2 Radix:

d*k*n = 2*10*n < nlogn .... 20 < logn ... n > 2^20

So for numbers up to 1024 if you have more than 2^20 numbers n*k*N is better than nlogn

Because big O notation discards multiplicative constants on the running time, and ignores efficiency for low input sizes, it does not always reveal the fastest algorithm in practice or for practically-sized data sets, but the approach is still very effective for comparing the scalability of various algorithms as input sizes become large.

edited Feb 21, 2011 at 20:37

answered Feb 21, 2011 at 19:17

Yochai Timmer

49.6k25 gold badges154 silver badges191 bronze badges

3 Comments

corsiKa Over a year ago

If I'm not mistaken, you also need to take into account lg(k) as well when using characters instead of bits, unless you're using buckets which makes it bucket sort instead.

Yochai Timmer Over a year ago

Radix does use buckets. Added the size of alphabet for the number of buckets that will be used.

Yochai Timmer Over a year ago

When using strings d will either be 256 (byte) or less if you know you're getting only alphanumeric, or predefined and know characters. K will then be then number of characters (average, but take max for safety)

Collectives™ on Stack Overflow

Efficient Sorting?

4 Answers 4

12 Comments

1 Comment

4 Comments

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

12 Comments

1 Comment

4 Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related