std::vector faster than plain array?

Question

I have just tried to benchmark std::sort on both std::vector<std::pair<float, unsigned int>> (filled with push_back operation) and plain std::pair<float, unsigned int>> * array (allocated using new and then filled one by one). The compare function just compared the float parts of the pairs.

Surprisingly, when used on 16M values, on std::vector it took only about 1940 ms but on the array it was about 2190 ms. Can anybody explain how can be the vector faster? Is it due to cacheing, or is just array version of std::sort poorly implemented?

gcc (GCC) 4.4.5 20110214 (Red Hat 4.4.5-6)

Intel(R) Core(TM) i7 CPU 870 @ 2.93GHz - cache size 8192 KB (the computer has two quad-core CPUs but I assume the sort is only single-threaded)

EDIT: Now you can call me dumbass, but when I tried to reproduce the code I used for the measurements (I have already deleted the original one) I cannot reproduce the results - now the array version takes about 1915 +- 5ms (measured on 32 runs). I can only swear I have run the test on 10 measurements three times (manually) with similar results, but this is not a rigorous proof.

There was probably some bug in the original code, background process seems unprobable because I have alternated measurements of vector and array versions and the vector results hold and no user was logged in.

Please, consider this question as closed. Thanks for your efforts.

How many times did you run the test? one time is not stasticly accurate at all. — Yet Another Geek
– Yet Another Geek, Commented Jun 16, 2011 at 13:37
You should provide the exact testing protocole you did follow. For example, if you didn't execute your tests more than one time for each, or if you did both tests in the same execution(s), then test isn't relevant. — Klaim
– Klaim, Commented Jun 16, 2011 at 13:38
On my i7 920 @ 2.67 GHz with gcc 4.5.2, I get 770 - 780 ms for the vector and 630 - 650 ms for the array. What optimization options did you use? (I used -march=native -O3, array and vector contained identical 16000000 pairs of random numbers) — Cubbi
– Cubbi, Commented Jun 16, 2011 at 13:39
@Konrad Given that he doesn't show us any of the code he's actually measuring, who knows? But the difference isn't very large (and he doesn't say how he measured it, either). — James Kanze
– James Kanze, Commented Jun 16, 2011 at 14:05

Ben Voigt · Accepted Answer · 2011-06-16 13:53:54Z

12

std::vector<std::pair<float, unsigned int>> (filled with push_back operation)

This stores all the data continguously, so memory locality is very good

std::pair<float, unsigned int>> * array (allocated using new and then filled one by one)

This scatters the data all over memory.

You've set up a very unfair comparison between vector and a simple array. The extra indirection involved in the array case is going to hurt, the lack of locality is going to kill cache performance. I'm surprised you don't see a bigger win in favor of contiguous storage.

answered Jun 16, 2011 at 13:53

Ben Voigt

286k45 gold badges444 silver badges764 bronze badges

Sign up to request clarification or add additional context in comments.

8 Comments

James Kanze Over a year ago

I don't think that this is right. If I understand correctly, the second allocates an array of std::pair<float, unsigned int>>, using array new. Locality should be exactly the same.

Ben Voigt Over a year ago

@James: That's not what the question literally says, although I understand how you could interpret it that way (and your interpretation may very well be correct). The question says he has an array of pointers to pairs. This is why questions without code, and performance questions especially, are a terrible idea.

James Kanze Over a year ago

@Ben I understood it that he called std::sort on 1) an std::vector<std::pair<float, unsigned int> > and 2) a std::pair<float, unsigned int>*, but it's far from clear. (But I'd expect a far greater difference if he really did use an array of pointers in the second case.)

Ben Voigt Over a year ago

@James: If the allocations were done with good time locality, space locality would be mediocre and not outright terrible. That's the best explanation I can offer.

phkahler Over a year ago

The recursive nature of quicksort along with the cache will make locality less important than you may think. Once you recurse down to some level, all the data is in cache and only indirection is causing a penalty, not locality. I'm assuming quicksort here.

|

Puppy · Accepted Answer · 2011-06-16 13:35:13Z

2

They will use the same version of sort. It's quite likely random CPU effects, like caching or thread context switches.

answered Jun 16, 2011 at 13:35

Puppy

147k40 gold badges271 silver badges480 bronze badges

5 Comments

Yet Another Geek Over a year ago

How do you know they use the same version of sort? As I recall it is not specified how they are implemented in the ISO specification. They probably just use the same interface/template.

Bo Persson Over a year ago

@Yet - If he is using gcc (GCC) 4.4.5 20110214 we know what std::sort looks like.

Puppy Over a year ago

@Yet Another Geek: Because vector iterators and array iterators are the same thing in release builds.

Yet Another Geek Over a year ago

@DeadMG I thought array iterated using pointers, and vector through some kind of typesafe class based reference

Puppy Over a year ago

@Yet Another Geek: In release builds, vector<T>::iterator can be typedefed as T*. Their specification is identical.

pmr · Accepted Answer · 2011-06-16 13:47:55Z

Did you use -O3 to compile your code?

If not, do it. All other benchmark results are meaningless, especially for template code.

Did you run the test many times?

This is done to prevent things like interrupts and or caching to have to much of an impact on your result.

Don't use floatint point comparison or arithmetic for benchmarks. The results depend heavily on the compiler, platform, compiler options etc.

How was your testdata created?

The time required by most sorting algorithm changes depending on the sorting of the input data.

Which method for measuring time did you use? Clock cycles? A timer?

Anyway, writing benchmarks that provide reliable results is not as easy as it might seem at first. Don't use a benchmark to determine what the proper code for your problem is.

Collectives™ on Stack Overflow

std::vector faster than plain array?

3 Answers 3

8 Comments

5 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

8 Comments

5 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related