Why is processing a sorted array not faster than an unsorted array in Python?

Question

In this post Why is processing a sorted array faster than random array, it says that branch predicton is the reason of the performance boost in sorted arrays.

But I just tried the example using Python; and I think there is no difference between sorted and random arrays (I tried both bytearray and array; and use line_profile to profile the computation).

Am I missing something?

Here is my code:

from array import array
import random
array_size = 1024
loop_cnt = 1000
# I also tried 'array', and it's almost the same
a = bytearray(array_size)
for i in xrange(array_size):
    a.append(random.randint(0, 255))
#sorted                                                                         
a = sorted(a)
@profile
def computation():
    sum = 0
    for i in xrange(loop_cnt):
        for j in xrange(size):
            if a[j] >= 128:
                sum += a[j]

computation()
print 'done'

sorted(a) returns another list that is sorted, but it doesn't modify a. To even make the code do what you think it does, you'd have to do a = sorted(a), or better yet a.sort() instead. — Jeremy Roman
– Jeremy Roman, Commented Oct 11, 2012 at 15:10
You might want to look at the results for python here stackoverflow.com/a/18419405/1903116 — thefourtheye
– thefourtheye, Commented Aug 24, 2013 at 14:09
stackoverflow.com/q/11227809/3145716 check dis. this might help. — piyush
– piyush, Commented Apr 29, 2014 at 10:39
@rogerdpack: the sorting algorithm does not matter; all stable algorithms produce the same result. The sorting time is not profiled here. — jfs
– jfs, Commented Mar 14, 2015 at 17:43

Matteo Italia · Accepted Answer · 2012-10-11 15:38:35Z

19

I may be wrong, but I see a fundamental difference between the linked question and your example: Python interprets bytecode, C++ compiles to native code.

In the C++ code that if translates directly to a cmp/jl sequence, that can be considered by the CPU branch predictor as a single "prediction spot", specific to that cycle.

In Python that comparison is actually several function calls, so there's (1) more overhead and (2) I suppose the code that performs that comparison is a function into the interpreter used for every other integer comparison - so it's a "prediction spot" not specific to the current block, which gives the branch predictor a much harder time to guess correctly.

Edit: also, as outlined in this paper, there are way more indirect branches inside an interpreter, so such an optimization in your Python code would probably be buried anyway by the branch mispredictions in the interpreter itself.

edited Oct 11, 2012 at 15:38

answered Oct 11, 2012 at 15:14

Matteo Italia

128k18 gold badges219 silver badges313 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Mark Ransom · Accepted Answer · 2012-10-11 15:09:25Z

5

Two reasons:

Your array size is much too small to show the effect.
Python has more overhead than C so the effect will be less noticeable overall.

answered Oct 11, 2012 at 15:09

Mark Ransom

310k44 gold badges423 silver badges660 bronze badges

4 Comments

ming.kernel Over a year ago

This program takes 1.5 seconds on my mac-air, bigger array consumes too much time; I just don't want to wait.

dda Over a year ago

"I just don't want to wait" So you prefer we do it for you...?

ming.kernel Over a year ago

@dda Sorry, I mean that the function already takes 1.5 seconds when the configuration is as above; If we could get some performance boost from the sorted array, we can definitely see it. Actually, I have changed the array size 10 times bigger, or loop count 10 times bigger, the execution time increases linearly.

dda Over a year ago

I did a test on my MBP, multiplying array_size and loop_cnt by 10, and here's the result: Random array: 9.97857904434 Sorted array: 7.98291707039

user1591276 · Accepted Answer · 2012-10-17 03:18:56Z

5

I ported the original code to Python and ran it with PyPy. I can confirm that sorted arrays are processed faster than unsorted arrays, and that the branchless method also works to eliminate the branch with running time similar to the sorted array. I believe this is because PyPy is a JIT compiler and so branch prediction is happening.

[edit]

Here's the code I used:

import random
import time

def runme(data):
  sum = 0
  start = time.time()

  for i in xrange(100000):
    for c in data:
      if c >= 128:
        sum += c

  end = time.time()
  print end - start
  print sum

def runme_branchless(data):
  sum = 0
  start = time.time()

  for i in xrange(100000):
    for c in data:
      t = (c - 128) >> 31
      sum += ~t & c

  end = time.time()
  print end - start
  print sum

data = list()

for i in xrange(32768):
  data.append(random.randint(0, 256))

sorted_data = sorted(data)
runme(sorted_data)
runme(data)
runme_branchless(sorted_data)
runme_branchless(data)

edited Oct 17, 2012 at 3:18

answered Oct 15, 2012 at 6:44

user1591276

1932 silver badges11 bronze badges

1 Comment

user1591276 Over a year ago

In an MBP with 2.53 GHz Intel Core 2 Duo, and PyPy 1.9.0, the results are:

//  Branch - Random seconds = 36.2439880371  //  Branch - Sorted seconds = 18.3833880424  //  Branchless - Random seconds = 13.1689388752  //  Branchless - Sorted seconds = 12.3706789017

user622367 · Accepted Answer · 2012-10-11 15:10:47Z

4

sorted() returns a sorted array rather than sorting in place. You're actually measuring the same array twice.

answered Oct 11, 2012 at 15:10

user622367

1 Comment

ming.kernel Over a year ago

I just changed it to "a = sorted(a)"; it's still the same

Community · Accepted Answer · 2017-05-23 12:09:52Z

-3

Click here to see more answers and similar question. The reason why the performance improves drastically when the data are sorted is that the branch prediction penalty is removed, as explained beautifully in Mysticial's answer.

edited May 23, 2017 at 12:09

CommunityBot

11 silver badge

answered Jun 16, 2014 at 13:32

Evenure

1972 silver badges17 bronze badges

Collectives™ on Stack Overflow

Why is processing a sorted array not faster than an unsorted array in Python?

5 Answers 5

Comments

4 Comments

1 Comment

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

Comments

4 Comments

1 Comment

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related