0

I've been looking an answer for this question for sometime now... "What is the most efficient way to sort a million 32-bit integers?"

I feel that Quick sort is the most efficient in sorting.. with an average Runtime of O(n*log n). (with the worst case O(n²))..

But some search results says that Radix sort/Merge sort are efficient for sorting million integers.

Any pointers?

3
  • 3
    There are many unanswered questions here that are important. For instance, is the input data partially sorted already? Or is it totally random? What's the distribution of the data, etc? Quicksort is wellknown more for being reasonanly decent most of the time, and rarely really bad, more-so than being the ultimate sorting algorithmn. There are certainly other contenders, for instance timsort, used in python: en.wikipedia.org/wiki/Timsort Commented Feb 21, 2011 at 19:17
  • 2
    If there was a single simple answer to this question, I think many of us would not have jobs. Commented Feb 21, 2011 at 19:38
  • 1
    What I find interesting is, what is the overall impact of sorting on your actual problem? I've seen developpers spending days on optimizing special algorithms like this, ignoring the fact that the sorting just contributed a non-significant amount to the actual runtime of the system whereas in other areas changes could be made to circumvent the real bottlenecks within minutes. Or is this just a theoretical question? Commented Feb 21, 2011 at 19:44

4 Answers 4

3

Mergesort is guarenteed to be O(n lg n), but has a higher memory footprint than quick sort.

Quicksort generally runs faster than mergesort, but under ~some~ circumstances it can degrade to quadratic running time.

Radix sort is O(n*r) where r is the length of the numbers.

To determine if radix is better than your chosen lg-n method, do this:

n * r < n * lg (n)
divide by n on both sides
r < lg(n)

We know r is 32 bits

32 < lg(n)

for both sides, take 2^x

2^32 < 2^(lg(n)

2^32 < n

So if n is less than 2^32 (4 billion) then use the lg-n algorithm.

Personally, I'd use quick sort, shuffling it if I have to in order to prevent it from degrading.

Sign up to request clarification or add additional context in comments.

12 Comments

Ever thought about the meaning of the O(...) notation? It's not as simple as that.
@Yochai Sorting by digits is radix in base 10. Sorting by bits (which doesn't require the use of buckets) is done in base 2. You could use whatever base you want for it. If you can use base 10, why not just use base 2^32? You can take "32 bits" down to "8 digits" (of hex) if you want, or for that matter, if you want 256 buckets, you can take it down to 4... but you incur additional costs.
@Axel You have two parts to your comment, and I feel both are in need of explanation for them to be anything but flame bait.
You should know the actual computation if you're concerned by performance . if you get kdN iterations it's a big difference if the limit is 1000 iterations or 10000 iterations. big O is just computational complexity. 100000 iteration * N is still big O(N).
@glowcoder But you completely ignored those constants in your answer and recommend "So if n is less than 2^32 (4 billion) then use the lg-n algorithm". This just won't work, because the threshold might as well be any other value, which is impossible to tell based on what is given in the OPs question.
|
1

if you have enough space maybe you can try bucket sort (http://en.wikipedia.org/wiki/Bucket_sort). It's more efficient but requires additional memory to store datas.

1 Comment

They are equivalent. Bucket sort is a kind of Radix sort.
0

Merge sort is O(n log n) in its worst case, so its going to be better than quick sort most of the time. Radix sorts, iirc, are only really useful when each thing being sorted is the same length. Its timed in O(K * N) which is (item length) * (number of items). I don't think I've ever needed to use a Radix sort.

4 Comments

"Better than quicksort most of the time" [citation needed].
what citation needed? if your worst case is as good as the best case for another algorithm, that's all the citation you need. It's just plain math. But if you want to be all wiki about it, en.wikipedia.org/wiki/Merge_sort will tell you that merge sort's worst case does 39% fewer comparisons than quicksort's average case, so even in the average case, you're still better off with a merge sort than a quick sort.
but that same wikipedia article says "On typical modern architectures, efficient quicksort implementations generally outperform mergesort for sorting RAM-based arrays.", so you're still better of with... wait what?
Martinho -- The efficiency of an algorithm, and the one you choose, is also going to depend on your data structure. For instance, if you're storing data in a hash table then you won't often need to sort since you're almost always going to have a constant access time of O(1) to do your lookup and sorting isn't necessary. Quicksort is easy enough to implement and performs fairly well, and is usually 'good enough,' but without more information from the OP, and based solely on the math, merge sort is better without having to rely on compiler optimizations or specific architectures.
0

Radix is better if for large numbers, especially when you know the range of the numbers.

Calculation fix:

Radix is O(kN) where k is the number of digits in the largest number. (Actually it's about d*k*N where d is the digit base, number of buckets that will be used ... Alphabet = 26, decimal = 10 , binary = 2)

Maxint = 4,294,967,296
32 bits: k = 32 / log(d)

Base 10 Radix:

d*k*n = 10*10*n < nlogn .... 100 < logn ... n > 2^100  

Base 2 Radix:

d*k*n = 2*32*n < nlogn .... 64 < logn ... n > 2^64

So for 32bit numbers if you have more than 2^64 numbers n*k*N is better than nlogn

But, if you know that the range will be up to 1024 , and not MAXINT for example:

MaxNumber = 1024

Base 10 Radix:

d*k*n = 10*4*n < nlogn .... 40 < logn ... n > 2^40 

Base 2 Radix:

d*k*n = 2*10*n < nlogn .... 20 < logn ... n > 2^20

So for numbers up to 1024 if you have more than 2^20 numbers n*k*N is better than nlogn

Because big O notation discards multiplicative constants on the running time, and ignores efficiency for low input sizes, it does not always reveal the fastest algorithm in practice or for practically-sized data sets, but the approach is still very effective for comparing the scalability of various algorithms as input sizes become large.

3 Comments

If I'm not mistaken, you also need to take into account lg(k) as well when using characters instead of bits, unless you're using buckets which makes it bucket sort instead.
Radix does use buckets. Added the size of alphabet for the number of buckets that will be used.
When using strings d will either be 256 (byte) or less if you know you're getting only alphanumeric, or predefined and know characters. K will then be then number of characters (average, but take max for safety)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.