0

Why would I be getting such poor performance from the code below?

The following command line uses 16 threads, with a load of 60. On my machine this takes approximately 31 seconds to finish (with some slight variations if you rerun)

testapp.exe 16 60

Using a load of 60, on Microsoft Windows Server 2008 R2 Enterprise SP1, running on 16 Intel Xeon E5-2670 @ 2.6 GHz CPUs I get the following performance:

1 cpu - 305 seconds

2 cpus - 155 seconds

4 cpus - 80 seconds

8 cpus - 45 seconds

10 cpus - 41 seconds

12 cpus - 37 seconds

14 cpus - 34 seconds

16 cpus - 31 seconds

18 cpus - 27 seconds

20 cpus - 24 seconds

22 cpus - 23 seconds

24 cpus - 21 seconds

26 cpus - 20 seconds

28 cpus - 19 seconds

After this it flat-lines ...

I get approximately the same performance using .Net 3.5, 4, 4.5 or 4.5.1.

I understand the drop-off in performance after 22 cpus, as I only have 16 on the box. What I don't understand is the poor performance after 8 cpus. Can anyone explain? Is this normal?

private static void Main(string[] args)
{
    int threadCount;
    if (args == null || args.Length < 1 || !int.TryParse(args[0], out threadCount))
        threadCount = Environment.ProcessorCount;

    int load;
    if (args == null || args.Length < 2 || !int.TryParse(args[1], out load))
        load = 1;

    Console.WriteLine("ThreadCount:{0} Load:{1}", threadCount, load);

    List<Thread> threads = new List<Thread>();

    for (int i = 0; i < threadCount; i++)
    {
        int i1 = i;
        threads.Add(new Thread(() => DoWork(i1, threadCount, load)));
    }

    Stopwatch timer = Stopwatch.StartNew();

    foreach (var thread in threads)
    {
        thread.Start();
    }

    foreach (var thread in threads)
    {
        thread.Join();
    }

    timer.Stop();

    Console.WriteLine("Time:{0} seconds", timer.ElapsedMilliseconds/1000.0);
}

static void DoWork(int seed, int threadCount, int load)
{
    double[,] mtx = new double[3,3];

    for (int i = 0; i < ((100000 * load)/threadCount); i++)
    {
        for (int j = 0; j < 100; j++)
        {
            mtx = new double[3,3];

            for (int k = 0; k < 3; k++)
            {
                for (int l = 0; l < 3; l++)
                {
                    mtx[k, l] = Math.Sin(j + (k*3) + l + seed);
                }
            }
        }
    }
}
13
  • Note that if you compare like for like, and look at 1, 2, 4, 8, 16 - ie miss out the relatively smaller 10, 12, 14 steps, there's still a relatively "big" drop from 45 -> 31. Commented Sep 11, 2015 at 16:05
  • 4
    I'm not sure that you're benchmarking actual computations there. It seems like what you're really benchmarking is concurrent heap allocations. Commented Sep 11, 2015 at 16:06
  • 1
    How much time is spent in GC? Are you using the client or the server GC? Commented Sep 11, 2015 at 16:07
  • 1
    @displayName It's in the code sample, scroll down. Commented Sep 11, 2015 at 16:10
  • 3
    I would recommend two (alternative) changes to this experiment: (1) preallocate some new double[,] arrays in the starting thread, pass each one to each child thread and then reuse it instead of reallocating it in the loop or (2) stackalloc a double[3 * 3] in the loops and use that. Otherwise, you may be accidentally benchmarking the performance of the memory allocator or the garbage collector under rapid allocations, instead of your code per se. Commented Sep 11, 2015 at 16:18

2 Answers 2

1

Please refer to the Intel ARK for the XEON E5-2670

This particular processor has 8 physical cores which are hyper-threaded. This is why you see a performance drop after 8 threads. Calling Environment.ProcessorCount gets 16 logical cores (2 logical cores per physical core because they are hyperthreaded).

A similar question has been answered on SuperUser.

You can try to set the affinity of the threads see if it makes a difference, but the scheduler usually does a good job of allocating resource.

Hope this helps.

Sign up to request clarification or add additional context in comments.

2 Comments

The machine has 32 logical processors, not 16
@Cronan: Can you provide any link which has the specifications of your CPU? I am seeing another link on Amazon: amazon.com/Intel-E5-2670-2-60Ghz-8-Core-Processor/dp/B007H29FRS and it says that core count is 8 and not 16. Or use this: superuser.com/questions/226552/… and let me know what core count your CPU says it has.
1

It is not that the threads that causes the performance to go down. But it is the "creation" of the thread itself.

Instead of creating a brand new thread, you need to borrow an already created thread form the OS thread pool. Use ThreadPool class instead of using new Thread()

1 Comment

I'm not creating new threads in a fast loop, I create the "worker" threads I want to use ahead of time, but thank you for answering

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.