6

I have just tried to benchmark std::sort on both std::vector<std::pair<float, unsigned int>> (filled with push_back operation) and plain std::pair<float, unsigned int>> * array (allocated using new and then filled one by one). The compare function just compared the float parts of the pairs.

Surprisingly, when used on 16M values, on std::vector it took only about 1940 ms but on the array it was about 2190 ms. Can anybody explain how can be the vector faster? Is it due to cacheing, or is just array version of std::sort poorly implemented?

gcc (GCC) 4.4.5 20110214 (Red Hat 4.4.5-6)

Intel(R) Core(TM) i7 CPU 870 @ 2.93GHz - cache size 8192 KB (the computer has two quad-core CPUs but I assume the sort is only single-threaded)

EDIT: Now you can call me dumbass, but when I tried to reproduce the code I used for the measurements (I have already deleted the original one) I cannot reproduce the results - now the array version takes about 1915 +- 5ms (measured on 32 runs). I can only swear I have run the test on 10 measurements three times (manually) with similar results, but this is not a rigorous proof.

There was probably some bug in the original code, background process seems unprobable because I have alternated measurements of vector and array versions and the vector results hold and no user was logged in.

Please, consider this question as closed. Thanks for your efforts.

10
  • How many times did you run the test? one time is not stasticly accurate at all. Commented Jun 16, 2011 at 13:37
  • You should provide the exact testing protocole you did follow. For example, if you didn't execute your tests more than one time for each, or if you did both tests in the same execution(s), then test isn't relevant. Commented Jun 16, 2011 at 13:38
  • On my i7 920 @ 2.67 GHz with gcc 4.5.2, I get 770 - 780 ms for the vector and 630 - 650 ms for the array. What optimization options did you use? (I used -march=native -O3, array and vector contained identical 16000000 pairs of random numbers) Commented Jun 16, 2011 at 13:39
  • 2
    @Konrad Given that he doesn't show us any of the code he's actually measuring, who knows? But the difference isn't very large (and he doesn't say how he measured it, either). Commented Jun 16, 2011 at 14:05
  • 1
    Are you using the same input data in both tests? Commented Jun 16, 2011 at 14:16

3 Answers 3

12

std::vector<std::pair<float, unsigned int>> (filled with push_back operation)

This stores all the data continguously, so memory locality is very good

std::pair<float, unsigned int>> * array (allocated using new and then filled one by one)

This scatters the data all over memory.

You've set up a very unfair comparison between vector and a simple array. The extra indirection involved in the array case is going to hurt, the lack of locality is going to kill cache performance. I'm surprised you don't see a bigger win in favor of contiguous storage.

Sign up to request clarification or add additional context in comments.

8 Comments

I don't think that this is right. If I understand correctly, the second allocates an array of std::pair<float, unsigned int>>, using array new. Locality should be exactly the same.
@James: That's not what the question literally says, although I understand how you could interpret it that way (and your interpretation may very well be correct). The question says he has an array of pointers to pairs. This is why questions without code, and performance questions especially, are a terrible idea.
@Ben I understood it that he called std::sort on 1) an std::vector<std::pair<float, unsigned int> > and 2) a std::pair<float, unsigned int>*, but it's far from clear. (But I'd expect a far greater difference if he really did use an array of pointers in the second case.)
@James: If the allocations were done with good time locality, space locality would be mediocre and not outright terrible. That's the best explanation I can offer.
The recursive nature of quicksort along with the cache will make locality less important than you may think. Once you recurse down to some level, all the data is in cache and only indirection is causing a penalty, not locality. I'm assuming quicksort here.
|
2

They will use the same version of sort. It's quite likely random CPU effects, like caching or thread context switches.

5 Comments

How do you know they use the same version of sort? As I recall it is not specified how they are implemented in the ISO specification. They probably just use the same interface/template.
@Yet - If he is using gcc (GCC) 4.4.5 20110214 we know what std::sort looks like.
@Yet Another Geek: Because vector iterators and array iterators are the same thing in release builds.
@DeadMG I thought array iterated using pointers, and vector through some kind of typesafe class based reference
@Yet Another Geek: In release builds, vector<T>::iterator can be typedefed as T*. Their specification is identical.
1

Did you use -O3 to compile your code?

If not, do it. All other benchmark results are meaningless, especially for template code.

Did you run the test many times?

This is done to prevent things like interrupts and or caching to have to much of an impact on your result.

Don't use floatint point comparison or arithmetic for benchmarks. The results depend heavily on the compiler, platform, compiler options etc.

How was your testdata created?

The time required by most sorting algorithm changes depending on the sorting of the input data.

Which method for measuring time did you use? Clock cycles? A timer?

Anyway, writing benchmarks that provide reliable results is not as easy as it might seem at first. Don't use a benchmark to determine what the proper code for your problem is.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.