Stable, efficient sort?

Question

I'm trying to create an unusual associative array implementation that is very space-efficient, and I need a sorting algorithm that meets all of the following:

Stable (Does not change the relative ordering of elements with equal keys.)
In-place or almost in-place (O(log n) stack is fine, but no O(n) space usage or heap allocations.
O(n log n) time complexity.

Also note that the data structure to be sorted is an array.

It's easy to see that there's a basic algorithm that matches any 2 of these three (insertion sort matches 1 and 2, merge sort matches 1 and 3, heap sort matches 2 and 3), but I cannot for the life of me find anything that matches all three of these criteria.

Will your data have regular updates? If so then putting in one huge array is a bad idea. Consider a structure that can be fragmented such as a B-tree or rope. — finnw
– finnw, Commented Oct 4, 2008 at 14:35
It seems odd to be happy with O(n log n) time complexity but have an issue with O(n) space usage.. Could you elaborate on what your actual objective is? there's a risk you are falling into the XY problem trap. — mikera
– mikera, Commented Jan 27, 2012 at 15:51
block merge sorts are an in-place and stable variations of merge sort. There are several variations called grail. Space complexity can be O(1), time complexity O(n log(n)), about 50% slower than a standard merge sort. — rcgldr
– rcgldr, Commented Jun 3 at 2:55

jjnguy · Accepted Answer · 2008-09-23 20:33:22Z

10

Merge sort can be written to be in-place I believe. That may be the best route.

edited Sep 23, 2008 at 20:33

answered Sep 22, 2008 at 3:25

jjnguy

139k54 gold badges298 silver badges328 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Corey D Over a year ago

comjnl.oxfordjournals.org/cgi/content/abstract/35/6/643 This is probably the algorithm you want.

Tyler · Accepted Answer · 2008-09-22 04:14:33Z

9

Note: standard quicksort is not O(n log n) ! In the worst case, it can take up to O(n^2) time. The problem is that you might pivot on an element which is far from the median, so that your recursive calls are highly unbalanced.

There is a way to combat this, which is to carefully pick a median which is guaranteed, or at least very likely, to be close to the median. It is surprising that you can actually find the exact median in linear time, although in your case it sounds like you care about speed so I would not suggest this.

I think the most practical approach is to implement a stable quicksort (it's easy to keep stable) but use the median of 5 random values as the pivot at each step. This makes it highly unlikely that you'll have a slow sort, and is stable.

By the way, merge sort can be done in-place, although it's tricky to do both in-place and stable.

answered Sep 22, 2008 at 4:14

Tyler

28.9k13 gold badges94 silver badges108 bronze badges

2 Comments

Michael Deardeuff Over a year ago

Fundamentals of Algorithms pg 237 describes a way to make quicksort O(n log n) except if all elements are the same. It recursively picks the median to pivot on, returning the pivoted list which quicksort then recurses down. Having said that, I agree that the median of 5 is the best way to do it.

Kelly Bundy May 31 at 13:31

What's that easy stable quicksort?

davenpcj · Accepted Answer · 2008-09-22 04:07:14Z

3

There's a list of sort algorithms on Wikipedia. It includes categorization by execution time, stability, and allocation.

Your best bet is probably going to be modifying an efficient unstable sort to be stable, thereby making it less efficient.

answered Sep 22, 2008 at 4:07

davenpcj

12.7k5 gold badges43 silver badges38 bronze badges

Comments

davenpcj · Accepted Answer · 2008-09-22 03:25:36Z

2

What about quicksort?

Exchange can do that too, might be more "stable" by your terms, but quicksort is faster.

answered Sep 22, 2008 at 3:25

davenpcj

12.7k5 gold badges43 silver badges38 bronze badges

2 Comments

freespace Over a year ago

The example given in en.wikipedia.org/wiki/Quicksort#Algorithm is stable, though not the most efficient version of qsort.

cjm Over a year ago

It's my understanding that variations of Quicksort can be made stable, or efficient, but not both at the same time.

Rafał Dowgird · Accepted Answer · 2008-09-22 08:32:47Z

2

There is a class of stable in-place merge algorithms, although they are complicated and linear with a rather high constant hidden in the O(n). To learn more, have a look at this article, and its bibliography.

Edit: the merge phase is linear, thus the mergesort is nlog_n.

answered Sep 22, 2008 at 8:32

Rafał Dowgird

45.4k11 gold badges80 silver badges95 bronze badges

1 Comment

rcgldr Jun 3 at 2:50

With an optimized block merge sort, the constant is about 1.5, about 50% slower than a standard merge sort.

Eric · Accepted Answer · 2008-10-04 14:26:01Z

1

Because your elements are in an array (rather than, say, a linked list) you have some information about their original order available to you in the array indices themselves. You can take advantage of this by writing your sort and comparison functions to be aware of the indices:

function cmp( ar, idx1, idx2 )
{
   // first compare elements as usual
   rc = (ar[idx1]<ar[idx2]) ? -1 : ( (ar[idx1]>ar[idx2]) ? 1 : 0 );

   // if the elements are identical, then compare their positions
   if( rc != 0 )
      rc = (idx1<idx2) ? -1 : ((idx1>idx2) ? 1 : 0);

   return rc; 
}

This technique can be used to make any sort stable, as long as the sort ONLY performs element swaps. The indices of elements will change, but the relative order of identical elements will stay the same, so the sort remains robust. It won't work out of the box for a sort like heapsort because the original heapification "throws away" the relative ordering, though you might be able to adapt the idea to other sorts.

answered Oct 4, 2008 at 14:26

Eric

11.7k14 gold badges61 silver badges103 bronze badges

2 Comments

Konrad Rudolph Over a year ago

I was going to propose the same thing.

wnoise Over a year ago

This won't work for all algorithms. A sort could compare a_1 with some b, causing it to get swapped relative to some a_2 between them. You may be able to use it for some, but you have a hefty proof obligation.

paxdiablo · Accepted Answer · 2009-08-14 12:42:18Z

Quicksort can be made stable reasonably easy simply by having an sequence field added to each record, initializing it to the index before sorting and using it as the least significant part of the sort key.

This has a slightly adverse effect on the time taken but it doesn't affect the time complexity of the algorithm. It also has a minimal storage cost overhead for each record, but that rarely matters until you get very large numbers of records (and is mimimized with larger record sizes).

I've used this method with C's qsort() function to avoid writing my own. Each record has a 32-bit integer added and populated with the starting sequence number before calling qsort().

Then the comparison function checked the keys and the sequence (this guarantees there are no duplicate keys), turning the quicksort into a stable one. I recall that it still outperformed the inherently stable mergesort for the data sets I was using.

Your mileage may vary, so always remember: Measure, don't guess!

ReaperUnreal · Accepted Answer · 2009-08-17 20:04:02Z

1

There's a nice list of sorting functions on wikipedia that can help you find whatever type of sorting function you're after.

For example, to address your specific question, it looks like an in-place merge sort is what you want.

However, you might also want to take a look at strand sort, it's got some very interesting properties.

answered Aug 17, 2009 at 20:04

ReaperUnreal

1,0007 silver badges21 bronze badges

Comments

nik3daz · Accepted Answer · 2009-11-04 16:51:29Z

1

Quicksort can be made stable by doing it on a linked list. This costs n to pick random or median of 3 pivots but with a very small constant (list traversal).

By splitting the list and ensuring that the left list is sorted so same values go left and the right list is sorted so the same values go right, the sort will be implicity stable for no real extra cost. Also, since this deals with assignment rather than swapping, I think the speed might actually be slightly better than a quick sort on an array since there's only a single write.

So in conclusion, list up all your items and run quicksort on a list

answered Nov 4, 2009 at 16:51

nik3daz

1091 silver badge7 bronze badges

Comments

Ryan · Accepted Answer · 2008-09-22 18:20:17Z

0

Perhaps shell sort? If I recall my data structures course correctly, it tended to be stable, but it's worse case time is O(n log^2 n), although it performs O(n) on almost sorted data. It's based on insertion sort, so it sorts in place.

edited Sep 22, 2008 at 18:20

answered Sep 22, 2008 at 3:27

Ryan

15.4k7 gold badges51 silver badges51 bronze badges

2 Comments

leppie Over a year ago

So it's sometimes stable? I think that is the exact definition of unstable :)

Ryan Over a year ago

Sometimes is different than usually :)

I answer wrong - have fun · Accepted Answer · 2008-09-23 15:15:03Z

0

Don't worry too much about O(n log n) until you can demonstrate that it matters. If you can find an O(n^2) algorithm with a drastically lower constant, go for it!

The general worst-case scenario is not relevant if your data is highly constrained.

In short: Run some test.

answered Sep 23, 2008 at 15:15

I answer wrong - have fun

2,8392 gold badges26 silver badges36 bronze badges

1 Comment

dsimcha Over a year ago

I agree with phyzome in general, big-O doesn't matter unless N has a decent chance of being large. However, what I'm trying to do is write a space-efficient associative array to fit large amounts of data in RAM, so the whole point is that N is huge.

Thomas Mueller · Accepted Answer · 2025-06-02 09:03:38Z

I have implemented a stable in-place quicksort and a stable in-place merge sort. The merge sort is a bit faster, and guaranteed to work in O(n*log(n)^2), but not the quicksort. Both use O(log(n)) space. A slightly shorter version of the stable merge sort is:

public class MergeSort<T> {
    public static <T> void sort(T[] data, Comparator<T> comp) {
        sort(data, comp, 0, data.length);
    }
    private static <T> void sort(T[] d, Comparator<T> comp, int from, int to) {
        if (to - from < 30) {
            InsertionSort.insertionSort(d, from, to - 1, comp);
            return;
        }
        int mid = from + (to - from) / 2;
        sort(d, comp, from, mid);
        sort(d, comp, mid, to);
        merge(d, comp, from, mid, to, mid - from, to - mid);
    }
    private static <T> int binarySearch(T[] d, Comparator<T> comp, int from, int to, int val, int smaller) {
        int len = to - from;
        while (len > 0) {
            int half = len / 2;
            int mid = from + half;
            if (comp.compare(d[mid], d[val]) < smaller) {
                from = mid + 1;
                len = len - half - 1;
            } else {
                len = half;
            }
        }
        return from;
    }
    private static <T> void reverse(T[] d, int from, int to) {
        while (from < to) {
            swap(d, from++, to--);
        }
    }
    private static <T> void merge(T[] d, Comparator<T> comp, int from, int pivot, int to, int len1, int len2) {
        if (len1 == 0 || len2 == 0) {
            return;
        }
        if (len1 + len2 == 2) {
            if (comp.compare(d[pivot], d[from]) < 0) {
                swap(d, pivot, from);
            }
            return;
        }
        int len1b = len1 / 2;
        int len2b = len2 / 2;
        int firstCut = from + len1b;
        int secondCut = pivot + len2b;
        if (len1 > len2) {
            secondCut = binarySearch(d, comp, pivot, to, firstCut, 0);
            len2b = secondCut - pivot;
        } else {
            firstCut = binarySearch(d, comp, from, pivot, secondCut, 1);
            len1b = firstCut - from;
        }
        if (firstCut != pivot && pivot != secondCut) {
            reverse(d, firstCut, pivot - 1);
            reverse(d, pivot, secondCut - 1);
            reverse(d, firstCut, secondCut - 1);
        }
        int mid = firstCut + len2b;
        merge(d, comp, from, firstCut, mid, len1b, len2b);
        merge(d, comp, mid, secondCut, to, len1 - len1b, len2 - len2b);
    }
    private static <T> void swap(T[] d, int i, int j) {
        T temp = d[i];
        d[i] = d[j];
        d[j] = temp;
    }
}

By the way, it might be possible to create more than two partitions. Also, smaller arrays should be sorted with a different algorithm (for example insertion sort). The algorithm above is just a starting point really.

Mike Dunlavey · Accepted Answer · 2009-08-17 19:44:33Z

-1

Maybe I'm in a bit of a rut, but I like hand-coded merge sort. It's simple, stable, and well-behaved. The additional temporary storage it needs is only N*sizeof(int), which isn't too bad.

answered Aug 17, 2009 at 19:44

Mike Dunlavey

40.8k15 gold badges95 silver badges140 bronze badges

Collectives™ on Stack Overflow

Stable, efficient sort?

13 Answers 13

1 Comment

2 Comments

Comments

2 Comments

1 Comment

2 Comments

Comments

Comments

Comments

2 Comments

1 Comment

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

13 Answers 13

1 Comment

2 Comments

Comments

2 Comments

1 Comment

2 Comments

Comments

Comments

Comments

2 Comments

1 Comment

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related