Fastest way to sort in Python

Question

What is the fastest way to sort an array of whole integers bigger than 0 and less than 100000 in Python? But not using the built in functions like sort.

Im looking at the possibility to combine 2 sport functions depending on input size.

@Anders: Don't reinvent the wheel. The built in sort() should suffice for your case. — user225312
– user225312, Commented Oct 4, 2010 at 13:19
I'm curious: why do you need to implement your own sorting routine? It smells like a homework assignment to me :-) — Rob Vermeulen
– Rob Vermeulen, Commented Oct 4, 2010 at 13:31

Jared Burrows · Accepted Answer · 2013-10-20 03:32:54Z

29

If you are interested in asymptotic time, then counting sort or radix sort provide good performance.

However, if you are interested in wall clock time you will need to compare performance between different algorithms using your particular data sets, as different algorithms perform differently with different datasets. In that case, its always worth trying quicksort:

def qsort(inlist):
    if inlist == []: 
        return []
    else:
        pivot = inlist[0]
        lesser = qsort([x for x in inlist[1:] if x < pivot])
        greater = qsort([x for x in inlist[1:] if x >= pivot])
        return lesser + [pivot] + greater

Source: http://rosettacode.org/wiki/Sorting_algorithms/Quicksort#Python

edited Oct 20, 2013 at 3:32

Jared Burrows

55.9k26 gold badges160 silver badges190 bronze badges

answered Oct 4, 2010 at 13:23

fmark

58.9k27 gold badges104 silver badges107 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Tony Veijalainen Over a year ago

Good advice, except choice of variable list, which can cause nice errors. I post other, faster version.

Paul McMillan Over a year ago

Running the list comprehension twice over the same set of variables is probably also less than optimal.

fmark Over a year ago

@Tony Veijalainen “There are only two hard things in Computer Science: cache invalidation and naming things” -- I've changed the variable name

codaddict · Accepted Answer · 2010-10-04 13:16:47Z

8

Since you know the range of numbers, you can use Counting Sort which will be linear in time.

answered Oct 4, 2010 at 13:16

codaddict

457k83 gold badges501 silver badges537 bronze badges

1 Comment

Brian Over a year ago

(I did not downvote). Note that this is not a good algorithm if the array of integers is significantly smaller than 100000, as it will waste memory (and thus time) to construct the 100000 element list.

Tauquir · Accepted Answer · 2010-10-04 13:54:15Z

5

Early versions of Python used a hybrid of samplesort (a variant of quicksort with large sample size) and binary insertion sort as the built-in sorting algorithm. This proved to be somewhat unstable. S0, from python 2.3 onward uses adaptive mergesort algorithm.

Order of mergesort (average) = O(nlogn). Order of mergesort (worst) = O(nlogn). But Order of quick sort (worst) = n*2

if you uses list=[ .............. ]

list.sort() uses mergesort algorithm.

For comparison between sorting algorithm you can read wiki

For detail comparison comp

edited Oct 4, 2010 at 13:54

answered Oct 4, 2010 at 13:27

Tauquir

6,9437 gold badges39 silver badges48 bronze badges

2 Comments

aaronasterling Over a year ago

it's timsort which is more adaptive than mergesort

Tauquir Over a year ago

Timsort is an adaptive, stable, natural mergesort.

Magnus · Accepted Answer · 2010-10-04 22:51:49Z

4

Radix sort theoretically runs in linear time (sort time grows roughly in direct proportion to array size ), but in practice Quicksort is probably more suited, unless you're sorting absolutely massive arrays.

If you want to make quicksort a bit faster, you can use insertion sort] when the array size becomes small.

It would probably be helpful to understand the concepts of algorithmic complexity and Big-O notation too.

answered Oct 4, 2010 at 22:51

Magnus

1,2921 gold badge12 silver badges27 bronze badges

3 Comments

Anders Over a year ago

When you say the array size becomes small do you mean less than 64 in size?

Magnus Over a year ago

I'd say more about less than 10, but there's no right answer; the best idea is to experiment with different values and see which ends up faster.

Sidharth Ghoshal Over a year ago

radix sort is O(kn) where k is the number of digits in your integer. But the number of digits in a base $b$ integer R is $O(log_b R)$ so radix sort's "linear time" is basically extremely misleading. It's asymptotically not any different than a comparison sort. If it WAS, we could convert comparable items to integers, and then radix sort to beat the $O(n \log n)$ lower bound for sorting.

Marek · Accepted Answer · 2017-03-08 19:13:24Z

3

I might be a little late to the show, but there's an interesting article that compares different sorts at https://www.linkedin.com/pulse/sorting-efficiently-python-lakshmi-prakash

One of the main takeaways is that while the default sort does great we can do a little better with a compiled version of quicksort. This requires the Numba package.

Here's a link to the Github repo: https://github.com/lprakash/Sorting-Algorithms/blob/master/sorts.ipynb

answered Mar 8, 2017 at 19:13

Marek

3192 silver badges5 bronze badges

1 Comment

Constantin Hong Over a year ago

So default sort is kicking.

Rajan · Accepted Answer · 2010-10-04 22:43:11Z

1

We can use count sort using a dictionary to minimize the additional space usage, and keep the running time low as well. The count sort is much slower for small sizes of the input array because of the python vs C implementation overhead. The count sort starts to overtake the regular sort when the size of the array (COUNT) is about 1 million.

If you really want huge speedups for smaller size inputs, implement the count sort in C and call it from Python.

(Fixed a bug which Aaron (+1) helped catch ...) The python only implementation below compares the 2 approaches...

import random
import time

COUNT = 3000000

array = [random.randint(1,100000) for i in range(COUNT)]
random.shuffle(array)

array1 = array[:]

start = time.time()
array1.sort()
end = time.time()
time1 = (end-start)
print 'Time to sort = ', time1*1000, 'ms'

array2 = array[:]

start = time.time()
ardict = {}
for a in array2:
    try:
        ardict[a] += 1
    except:
        ardict[a] = 1

indx = 0
for a in sorted(ardict.keys()):
    b = ardict[a]
    array2[indx:indx+b] = [a for i in xrange(b)]
    indx += b

end = time.time()
time2 = (end-start)
print 'Time to count sort = ', time2*1000, 'ms'

print 'Ratio =', time2/time1

edited Oct 4, 2010 at 22:43

answered Oct 4, 2010 at 20:26

Rajan

9221 gold badge7 silver badges15 bronze badges

6 Comments

aaronasterling Over a year ago

+1 Ratio = 1.16710428623 on my machine. clever use of a dict. It's worth noting though that changing the dict construction phase from try: ardict[a] += 1; except: ardict[a] = 1 to if a in ardict: ardict[a] += 1; else: ardict[a] = 1 drops the ratio to Ratio = 0.696179723863 Sometimes (often) it is better to look before you leap. I knew to do this because try is only cheaper than if if the exception rarely occurs. An actual exception is still very expensive.

aaronasterling Over a year ago

unfortunately this algorithm is wrong. Try array = [1,10, 100, 1000, 10000, 100000, 1000000]. The dangers of skating on undocumented implementation details strike again.

Rajan Over a year ago

Thanks aaron - fixed the bug of not sorting the dict keys. That should slow it down a bit. However, it will preserve its almost O(n) nature if number of distinct elements compared to the array size is low. I would love to see a 3D plot of distinct elements, array length as x and y dimensions and ratio of running time and the 3rd dimension. Maybe I will do it in a day or 2.

Rajan Over a year ago

@Aaron: On my comp, the (try, except) works better. I coded it the (if then else) way first and switched to the (try, except) to speed up things. Also, the version with the bug removed runs faster than the previous one - because of the underlying C sort being used in sorting keys. Thats free beer right there. :-)

Tony Veijalainen Over a year ago

For me the except solution was also faster, little faster generation was by using tuple with generator if list is not necessary to produce (as it is mutable): array2=tuple(a for a in sorted(ardict) for i in xrange(ardict[a]))

|

Hannesh · Accepted Answer · 2010-10-04 13:19:02Z

0

The built in functions are best, but since you can't use them have a look at this:

http://en.wikipedia.org/wiki/Quicksort

answered Oct 4, 2010 at 13:19

Hannesh

7,5789 gold badges49 silver badges81 bronze badges

Comments

user2812083 · Accepted Answer · 2013-09-24 16:59:22Z

0

def sort(l):
    p = 0
    while(p<len(l)-1):
        if(l[p]>l[p+1]):
            l[p],l[p+1] = l[p+1],l[p]
            if(not(p==0)):
                p = p-1
        else:
            p += 1
    return l

this is a algorithm that I created but is really fast. just do sort(l) l being the list that you want to sort.

answered Sep 24, 2013 at 16:59

user2812083

1

1 Comment

glotchimo Over a year ago

That's a bubble sort

asdf · Accepted Answer · 2016-12-16 05:14:32Z

@fmark Some benchmarking of a python merge-sort implementation I wrote against python quicksorts from http://rosettacode.org/wiki/Sorting_algorithms/Quicksort#Python and from top answer.

Size of the list and size of numbers in list irrelevant

merge sort wins, however it uses builtin int() to floor

import numpy as np
x = list(np.random.rand(100))


# TEST 1, merge_sort 
def merge(l, p, q, r):
    n1 = q - p + 1
    n2 = r - q
    left = l[p : p + n1]
    right = l[q + 1 : q + 1 + n2]

    i = 0
    j = 0
    k = p
    while k < r + 1:
        if i == n1:
            l[k] = right[j]
            j += 1
        elif j == n2:
            l[k] = left[i]
            i += 1
        elif  left[i] <= right[j]:
            l[k] = left[i]
            i += 1
        else:
            l[k] = right[j]
            j += 1
        k += 1

def _merge_sort(l, p, r):
    if p < r:
        q = int((p + r)/2)
        _merge_sort(l, p, q)
        _merge_sort(l, q+1, r)
        merge(l, p, q, r)

def merge_sort(l):
    _merge_sort(l, 0, len(l)-1)

# TEST 2
def quicksort(array):
    _quicksort(array, 0, len(array) - 1)

def _quicksort(array, start, stop):
    if stop - start > 0:
        pivot, left, right = array[start], start, stop
        while left <= right:
            while array[left] < pivot:
                left += 1
            while array[right] > pivot:
                right -= 1
            if left <= right:
                array[left], array[right] = array[right], array[left]
                left += 1
                right -= 1
        _quicksort(array, start, right)
        _quicksort(array, left, stop)

# TEST 3
def qsort(inlist):
    if inlist == []: 
        return []
    else:
        pivot = inlist[0]
        lesser = qsort([x for x in inlist[1:] if x < pivot])
        greater = qsort([x for x in inlist[1:] if x >= pivot])
        return lesser + [pivot] + greater

def test1():
    merge_sort(x)

def test2():
    quicksort(x)

def test3():
    qsort(x)

if __name__ == '__main__':
    import timeit
    print('merge_sort:', timeit.timeit("test1()", setup="from __main__ import test1, x;", number=10000))
    print('quicksort:', timeit.timeit("test2()", setup="from __main__ import test2, x;", number=10000))
    print('qsort:', timeit.timeit("test3()", setup="from __main__ import test3, x;", number=10000))

jph · Accepted Answer · 2021-11-16 23:51:58Z

0

Bucket sort with bucket size = 1. Memory is O(m) where m = the range of values being sorted. Running time is O(n) where n = the number of items being sorted. When the integer type used to record counts is bounded, this approach will fail if any value appears more than MAXINT times.

def sort(items):
  seen = [0] * 100000
  for item in items:
    seen[item] += 1
  index = 0
  for value, count in enumerate(seen):
    for _ in range(count):
      items[index] = value
      index += 1

answered Nov 16, 2021 at 23:51

jph

2,2614 gold badges34 silver badges58 bronze badges

Collectives™ on Stack Overflow

Fastest way to sort in Python

10 Answers 10

3 Comments

1 Comment

2 Comments

3 Comments

1 Comment

6 Comments

Comments

1 Comment

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

10 Answers 10

3 Comments

1 Comment

2 Comments

3 Comments

1 Comment

6 Comments

Comments

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related