4

I want to sort an array of ints with a length of 1.000.000 to 100.000.000 . I want to run this program on a core2duo computer with 2Mb cache using pthread library. I want the fastest algorithm!

I have written a semi-parallel sort code which uses mergesort algorithm. But it's not fast enough!

          ___ sort___   
         /           \        
        /____ sort ___\     __ merge __
    ___/               \___/           \___ merge 
       \ ____ sort ____/   \__ merge __/    
        \             /      
         \___ sort __/      
4
  • What have you tried? What isn't working? Show us a code snippet you're having problems with. Commented Nov 10, 2011 at 10:44
  • I have written semi-parallel sort code which uses merge sort algorithm. Commented Nov 10, 2011 at 11:35
  • 1
    If you found out that it wasn't any faster then you probably discovered that your machine has multiple cores but only one memory bus. Which is the true bottleneck. Commented Nov 10, 2011 at 12:12
  • It uses shared memory but I think each core has access to memory independently. I also tested it on an i5 with 4Mb cache with the same result in performance! I'm not sure but I think the last merge sort which isn't in parallel reduces speed significantly. Commented Nov 10, 2011 at 12:23

3 Answers 3

2

Its been a while since i was at university but i seem to remember PSRS algorithm was good for this sort of thing. I am sure google will reveal loads of implementation / pseudo code.

Sign up to request clarification or add additional context in comments.

Comments

0

Quicksort lends itself to multithreading nicely.

When you partition, one side of the partition sort in current thread, the other side sort in a new thread.

Comments

0

Since you are on a core2duo, I would look at a Parallel Quicksort algorithm. It sorts in-place, conserving memory, and can achieve performance gains proportional to the number of processors for up to small numbers of processors.

A Parallel Quicksort algorithm basically performs the partition step, then performs quicksort on the left and right sublists in separate processes. This can be accomplished by storing bounds in a shared stack, which ultimately becomes the point of contention if run with larger thread counts.

There are other algorithms, such as PSRS, that scale to higher numbers of processors, but as you are on a core2duo, which will probably max you out at 2 true cores + two hyperthreaded cores, the extra memory needed for PSRS would probably be a waste. Given the number of elements you are seeking to sort, you will probably need to conserve memory.

I have implemented both in Java on Github. Let me know if you'd care to look at the code as a guide to implementing something with pthreads.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.