0

I am trying to understand how the "Parallel.ForEach" function works. I have read the MSDN about it and as an overall understanding it seems that, the more computations you put in the loop the faster it goes in comparison to a normal foreach loop.

Current computer has 4 Cores. (I will have installed my 24 core computer in two days.)

However I think I need some expertise insight in this. I have a sorting algorithm that do a sort in about 10 seconds on 1 core.

I have put a complete code where I do it on one core and where I do it in a: Parallel.ForEach loop with 4 cores.

I simply wonder if it is possible to speed up this sort using more cores in any possible way?

Running testsortFunction() produces below result:

enter image description here

    void testsortFunction()
    {
        String resultstring1 = ""; String resultstring2 = "";
        resultstring1 = sortingtestBENCHMARKS(false);
        resultstring2 = sortingtestBENCHMARKS(true);

        MessageBox.Show(resultstring1 + "\n\n" + resultstring2);
    }
    String sortingtestBENCHMARKS(bool useParallel)
    {
        List<String> sortedLIST = new List<String>(); 
        List<String> minusLIST = new List<String>(); 
        List<String> plusLIST = new List<String>(); 
        var stopwatch = new Stopwatch();

        //Add 3 million elements
        for (double i = 0; i < 1000000; i += 1)
        {
            plusLIST.Add(i + ",awb/aje" + " - " + "ddfas/asa" + " - " + "asoo/qwa");
        }
        for (double i = 1; i < 2000000; i += 1)
        {
            minusLIST.Add("-" + i + ",awb/aje" + " - " + "ddfas/asa" + " - " + "asoo/qwa");
        }

        //Do the sorting!
        if (useParallel == false)
        {
            stopwatch.Start();
            sortedLIST = sortLIST(minusLIST, plusLIST); //11 seconds
            stopwatch.Stop();
        }
        else
        {
            stopwatch.Start();
            Parallel.ForEach("dummy", (c) =>
            {
                sortedLIST = sortLIST(minusLIST, plusLIST); //32 seconds
            });
            stopwatch.Stop();
        }
        return "Elapsed Times in seconds(Using Parallel: " + useParallel + "):\n\n" + stopwatch.Elapsed.TotalSeconds; //10.57 seconds
    }
    List<String> sortLIST(List<String> minusLIST, List<String> plusLIST)
    {
        plusLIST = plusLIST.OrderBy(i => double.Parse(i.Split(',')[0])).ToList();plusLIST.Reverse();
        minusLIST = minusLIST.OrderBy(i => double.Parse(i.Split(',')[0].TrimStart('-'))).ToList();
        plusLIST.AddRange(minusLIST);
        return plusLIST;
    }
22
  • 1
    Use PLINQ instead, eg someList.AsParallel().OrderBy(...). Parallel sorting algorithms are very different beasts. If every core tried to access every item all CPU time would be wasted in thrashing and locking. To get any performance boost you need to partition the data and use one task to sort each partition. The sorted partitions have to be merged at the end. Commented Feb 21, 2019 at 17:31
  • 3
    This test is severely flawed. Your Parallel.ForEach is doing much more work. Commented Feb 21, 2019 at 17:31
  • 2
    BTW your test isn't sorting anything in parallel. It's repeating the same sort operation, as many times as there are cores Commented Feb 21, 2019 at 17:33
  • 2
    @PanagiotisKanavos oh heck you're right - it's performing one sort for each of the characters in "dummy" -- so it's doing exactly the same thing 5 times. It's doing them in parallel mind you, so it's only just over 3 times slower. Commented Feb 21, 2019 at 17:34
  • 1
    BTW why do you use double.Parse(i.Split(',')[0]) ?? If the text contains European-style decimal separators pass the appropriate CultureInfo object to double.Parse. Each string operation results in a new temporary string. This means that your code will generate 3 million extra temporary strings Commented Feb 21, 2019 at 17:35

2 Answers 2

2

I ran some benchmarks, and managed to speed it up by a factor of 2...

public List<String> ParallelSort()
{
    var result = new List<string>(plusLIST.Count + minusLIST.Count);

    var t1 = Task.Run(() => plusLIST.AsParallel().OrderByDescending(i => int.Parse(i.Split(',')[0])));
    var t2 = Task.Run(() => minusLIST.AsParallel().OrderBy(i => int.Parse(i.Split(',')[0].TrimStart('-'))));
    Task.WaitAll(t1, t2);

    result.AddRange(t1.Result);
    result.AddRange(t2.Result);
    return result;
}

Here's the benchmark, using BenchmarkDotNet.

(I reduced the list size by a factor of 10, because benchmarking takes a long time and I want to go home tonight).

|                 Method |      Mean |    Error |    StdDev |    Median | Gen 0/1k Op | Gen 1/1k Op | Gen 2/1k Op | Allocated Memory/Op |
|----------------------- |----------:|---------:|----------:|----------:|------------:|------------:|------------:|--------------------:|
|                 OPSort | 142.35 ms | 5.150 ms | 14.693 ms | 137.40 ms |  29000.0000 |   1000.0000 |           - |           135.99 MB |
| OPSortByDescendingLazy | 118.19 ms | 2.672 ms |  7.752 ms | 117.01 ms |  29000.0000 |   1000.0000 |           - |           127.32 MB |
|   SlightlyParallelSort | 122.62 ms | 3.063 ms |  8.538 ms | 120.45 ms |  29000.0000 |   1000.0000 |           - |           127.32 MB |
|           ParallelSort |  71.37 ms | 2.190 ms |  6.389 ms |  70.30 ms |  28000.0000 |   1000.0000 |           - |           148.63 MB |
|              RegexSort | 250.80 ms | 4.887 ms |  7.315 ms | 251.70 ms |  32000.0000 |   1000.0000 |           - |           145.99 MB |

And the code:

[MemoryDiagnoser]
public class Tests
{
    private List<String> minusLIST = new List<string>(100000);
    private List<String> plusLIST = new List<string>(200000);

    [IterationSetup]
    public void IterationSetup()
    {
        plusLIST.Clear();
        minusLIST.Clear();

        //Add 3 million elements
        for (double i = 0; i < 100000; i += 1)
        {
            plusLIST.Add(i + ",awb/aje" + " - " + "ddfas/asa" + " - " + "asoo/qwa");
        }
        for (double i = 1; i < 200000; i += 1)
        {
            minusLIST.Add("-" + i + ",awb/aje" + " - " + "ddfas/asa" + " - " + "asoo/qwa");
        }
    }

    [Benchmark]
    public List<String> OPSort()
    {
        plusLIST = plusLIST.OrderBy(i => double.Parse(i.Split(',')[0])).ToList(); plusLIST.Reverse();
        minusLIST = minusLIST.OrderBy(i => double.Parse(i.Split(',')[0].TrimStart('-'))).ToList();
        plusLIST.AddRange(minusLIST);
        return plusLIST;
    }

    [Benchmark]
    public List<String> OPSortByDescendingLazy()
    {
        var result = new List<string>(plusLIST.Count + minusLIST.Count);

        result.AddRange(plusLIST.OrderByDescending(i => int.Parse(i.Split(',')[0])));
        result.AddRange(minusLIST.OrderBy(i => int.Parse(i.Split(',')[0].TrimStart('-'))));
        return result;
    }

    [Benchmark]
    public List<String> SlightlyParallelSort()
    {
        var result = new List<string>(plusLIST.Count + minusLIST.Count);

        var t1 = Task.Run(() => plusLIST.OrderByDescending(i => int.Parse(i.Split(',')[0])));
        var t2 = Task.Run(() => minusLIST.OrderBy(i => int.Parse(i.Split(',')[0].TrimStart('-'))));
        Task.WaitAll(t1, t2);

        result.AddRange(t1.Result);
        result.AddRange(t2.Result);
        return result;
    }

    [Benchmark]
    public List<String> ParallelSort()
    {
        var result = new List<string>(plusLIST.Count + minusLIST.Count);

        var t1 = Task.Run(() => plusLIST.AsParallel().OrderByDescending(i => int.Parse(i.Split(',')[0])));
        var t2 = Task.Run(() => minusLIST.AsParallel().OrderBy(i => int.Parse(i.Split(',')[0].TrimStart('-'))));
        Task.WaitAll(t1, t2);

        result.AddRange(t1.Result);
        result.AddRange(t2.Result);
        return result;
    }

    private static readonly Regex splitRegex = new Regex(@"^(\d+)");
    private static readonly Regex splitWithMinusRegex = new Regex(@"^-(\d+)");

    // To test the suggestion from @PanagiotisKanavos 
    [Benchmark]
    public List<String> RegexSort()
    {
        plusLIST = plusLIST.OrderBy(i => double.Parse(splitRegex.Match(i).Groups[1].Value)).ToList(); plusLIST.Reverse();
        minusLIST = minusLIST.OrderBy(i => double.Parse(splitWithMinusRegex.Match(i).Groups[1].Value)).ToList();
        plusLIST.AddRange(minusLIST);
        return plusLIST;
    }
}

class Program
{
    static void Main(string[] args)
    {
        var summary = BenchmarkRunner.Run<Tests>();

        Console.WriteLine("========= DONE! ============");
        Console.ReadLine();
    }
}

It's probably possible to speed it up more: the next step is to break out a profiler.

Note that your arrays are large enough to be allocated on the Large Object Heap, which is generally bad news. You want to avoid as many of these sorts of allocations as you can, so don't let your lists figure out what size they need to be by themselves.

Sign up to request clarification or add additional context in comments.

7 Comments

Very interesting, I will study your code. I wonder: Task.Run(() => plusLIST.AsParallel. Does this actually use threads on ONE core or will that use the power of for example 24 cores. As I think new Thread() works up to 4 times faster but than starts to be slower as new Thread() works/ is isolated on one core only?
You can see from the benchmarking that AsParallel() speeds it up a bit. Using Task.Run means I can sort plusLIST and minusLIST at the same time.
Does AsParallel use more cores(ex: 24 cores) or only more threads on one core? and the same does Task.Run use more cores(ex: 24 cores) or only more threads on one core?
Threads aren't assigned to core - the OS will schedule threads to cores as it sees fit.
I see, I remember I did special tests some time ago for that. When I did new Thread() on the 24 cores computer. It only speeded the code up 4 times. After this the speed actually decreased as it never seemed to reach all cores. I am not sure if this does something else and actually use all cores then?
|
0

Parallel.ForEach is terrific for partionable problems: that is, problems you can split into entirely distinct subproblems. Its first parameter is an enumerable collection with each item in the collection representing a partition.

Sorting a single list of items is not partitionable. Your parallelized code just repeats the sort a whole mess of times.

By the way, consider using the list.Sort() method of dotnet for doing your sorts. Very smart people have optimized it. When the items being sorted are not simple strings or scalars, you can provide a comparison function.

3 Comments

The problem with List<T>.Sort is he's sorting by a key derived from his list item. Enumerable.OrderBy is smart about keeping a table of the keys, but you don't get any such help with List<T>.Sort
Yes that is true. .Sort() does not work in this case.
As an ideá can we sort "plusLIST" and "minusLIST" separately on 2 cores for example using any Parallel approach and sum up the sorted list after this. Assuming I would create 100 containers later?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.