2

I have the following script, and I know about the principle "Branch prediction" but it seems that's not the case here.

Why is it faster to process a sorted array than an unsorted array?

It seems to work the other way around.

When I run the following script without the sort($data) the script takes 193.23883700371 seconds to complete.

When I enable the sort($data) line the scripts takes 300.26129794121 seconds to complete.

Why is it so much slower in PHP? I used PHP 5.5 and 5.6. In PHP 7 the script is faster when the sort() is not commented out.

<?php

$size = 32768;
$data = array_fill(0, $size, null);

for ($i = 0; $i < $size; $i++) {
    $data[$i] = rand(0, 255);
}

// Improved performance when disabled
//sort($data);

$total = 0;

$start = microtime(true);

for ($i = 0; $i < 100000; $i++) {
    for ($x = 0; $x < $size; $x++) {
        if ($data[$x] >= 127) {
            $total += $data[$x];
        }
    }
}

$end = microtime(true);
echo($end - $start);
19
  • 2
    There is no earthly reason why it should be slower processing a sorted array compared with an unsorted array; as far as PHP is concerned, they are simply arrays Commented May 12, 2016 at 15:54
  • 1
    @SanderVisser How many times did you run the script? Maybe when you sorted the array, there were more digits above 127, so it took longer to sum it to the total? Commented May 12, 2016 at 15:55
  • 3
    In PHP an array is an ordered map. The likely culprit is that during creation the elements are relatively aligned in memory so that $data[1] is close to $data[2] the sort may just then be reassigning the index key values without changing the memory locations of the values. Now $data[1] may be far from $data[2] as a result there are memory cache misses when accessing the values, and time is lost loading the data from memory. Commented May 12, 2016 at 15:57
  • 3
    In PHP everything is a Hash table, including arrays. Sorted or not shouldn't make any difference, I think. rand() and the >= 127 condition will surely affect performance. Commented May 12, 2016 at 16:10
  • 1
    I notice that the results are similar in PHP 5 if you replace in the test the sort call with a shuffle call. So this means tampering with the original order degrades performance in PHP 5. Commented May 12, 2016 at 16:23

2 Answers 2

2

Based on my comments above the solution is to either find or implement a sort function that moves the values so that memory remains contiguous and gives you the speedup, or push the values from the sorted array into a second array so that the new array has contiguous memory.

Sign up to request clarification or add additional context in comments.

Comments

1

Assuming you MEANT to not time the actual sort, since your code doesn't time that action, it's difficult to assess any true performance difference because you've filled the array with random data. This means that one pass might have MANY more values greater than or equal to 127 (and thus running an additional command) then another pass. To really compare the two, fill your array with an identical set of fixed data. Otherwise, you'll never know if the random fill is causing the time differences you're seeing.

1 Comment

I can update the code to reuse the exact same array, and it will perform slower after a sort() is called.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.