Multithreading and
      Parallelization
               Dmitri Nesteruk
dmitrinesteruk@gmail.com | http://nesteruk.org/seminars
Agenda
 Overview
 Multithreading
   PowerThreading (AsyncEnumerator)
 Multi-core parallelization
   Parallel Extensions to .NET Framework
 Multi-computer parallelization
   PureMPI.NET
Why now?
 Manycore paradigm shift
   CPU speeds reach production challenges
   (not at the limit yet)

   growth
 Processor features
   Hyper-threading
   SIMD
CPU Scope
 Past: more             Yesterday
 transistors per chip    1x-core

 Present: more cores
 per chip                 Today
                           2x-core norm
 Future: even more         4x-

 cores per chip;
                            Tomorrow
 NUMA & other                 32x-core?
 specialties
Machine Scope
 Most clients are
 concerned with     Machine
 one-machine use
 Clustering helps
                     Cluster
 leverage
 performance
 Clouds               Cloud
Multithreading vs. Parallelization
 Multithreading
    Using threads/thread pool to perform async
    operations
    Explicit (# of threads known)
 Parallelization
    Implicit parallelization
    No explicit thread operation
Ways to Parallelize/Multithread
                             System.Threading
             Managed         Parr. Extensions
                             Libraries

                             OpenMP
            Unmanaged        Libraries

                             GPGPU
            Specialized      FPGA
Managed
 System.Threading
 Libraries
   Parallel Extensions (TPL + PLINQ)
   PowerThreading
 Languages/frameworks
   Sing#, CCR
 Remoting, WCF, MPI.NET, PureMPI.NET, etc.
   Use over many machines
Unmanaged
 OpenMP
 – #pragma directives in C++ code
 Intel multi-core libraries
   Threading Building Blocks (low-level)
   Integrated Performance Primitives
   Math Kernel Library (also has MPI support)
 MPI, PVM, etc.
   Use over many machines
Specialized Ex. (Intrinsic Parallelization)
  GPU Computation (GPGPU)
    Calculations on graphic card
    Uses programmable pixel shaders
    See, e.g., NVidia CUDA, GPGPU.org
  FPGA
    Hardware-specific solutions
    E.g., in-socket accelerators
    Requires HDL programming & custom hardware
Part I

Multithreading: a look at
AsyncEnumerator
Multithreading
 Goals
   Do stuff concurrently
   Preserve safety/consistency
 Tools
   Threads
   ThreadPool
   Synchronization objects
   Framework async APIs
A Look at Delegates
 Making delegate for function is easy
 Given void a() { … }
  – ThreadStart del = a;
 Given void a(int n) { … }
  – Action<int> del = a;
 Given float a(int n, double m) {…}
  – Func<int, double, float> del = a;
 Otherwise, make your own!
Delegate Methods
 Invoke()
   Synchronous, blocks your thread 
 BeginInvoke
   Executes in ThreadPool
   Returns IAsyncResult
 EndInvoke
   Waits for completion
   Takes the IAsyncResult from BeginInvoke
Usage
 Fire and forget
  – del.BeginInvoke(null, null);
 Fire, and wait until done
  – IAsyncResult ar = del.BeginInvoke(null,null);
    …
    del.EndInvoke(ar);
 Fire, and call a function when done
  – del.BeginInvoke(firedWhenDone, null);
                      Callback parameter
WaitOne and WaitAll
 To wait until either delegate completes
  – WaitHandle.WaitOne(
      new ThreadStart[] {
        ar1.AsyncWaitHandle,
        ar2.AsyncWaitHandle
      }); // wait until either completes
 To wait until all delegates complete
    Use WaitAll instead of WaitOne
  – [MTAThread]-specific, use Pulse & Wait instead
Example
Execute a() and b() in parallel; wait on both

ThreadStart delA = a;
ThreadStart delB = b;
IAsyncResult arA = delA.BeginInvoke(null, null);
IAsyncResult arB = delB.BeginInvoke(null, null);
WaitHandle.WaitAll(new [] {
  arA.AsyncWaitHandle,
  arB.AsyncWaitHandle });
LINQ Example
Execute a() and b() in parallel; wait on both
WaitHandle.WaitAll(
  new [] { a, b }
   Implicitly make an array of delegates
  .Select (f =>f.BeginInvoke(null,null)
                                    Call each delegate
                                 .AsyncWaitHandle)
  .ToArray());                      Get a wait handle of each
   Convert from
   IEnumerable to array
Asynchronous Programming Model (APM)
 Basic goal
  – IAsyncResult ar =
      del.BeginXXX(null,null);
    …
    del.EndXXX(ar);
 Supported by Framework classes, e.g.,
  – FileStream
  – WebRequest
Difficulties
  Async calls do not always succeed
    Timeout
    Exceptions
    Cancelation
  Results in too many functions/anonymous
  delegates
    Async workflow code becomes difficult to read
PowerThreading
 A free library from   Resource locks
 Wintellect (Jeffrey    ReaderWriterGate
 Richter)              Async. prog. model
 Get it at              AsyncEnumerator
 wintellect.com         SyncGate
                       Other features
 Also check out
                        IO
 PowerCollections       State manager
                        NumaInformation :)
AsyncEnumerator
 Simplifies APM programming
 No need to manually manage
 IAsyncResult cookies
 Fewer functions, cleaner code
Usage patterns
 1 async op → process
 X async ops → process all
 X async ops → process each one as it
 completes
 X async ops → process some, discard the rest
 X async ops → process some until
 cancellation/timeout occurs, discard the rest
AsyncEnumerator Basics
 Has three methods
   Execute(IEnumerator<Int32>)
   BeginExecute
   EndExecute
 Also exists as AsyncEnumerator<T> when a
 return value is required
Inside the Function
internal IEnumerator<Int32> GetFile(
AsyncEnumerator ae, string uri)
{
  WebRequest wr = WebRequest.Create(uri);
  wr.BeginGetResponse(ae.End(), null);
  yield return 1;
  WebResponse resp = wr.EndGetResponse(
    ae.DequeueAsyncResult());
  // use response
}
Signature
internal IEnumerator<Int32> GetFile(
  AsyncEnumerator ae, string uri)
{
  Function must return IEnumerator<Int32>
WebRequestwr = WebRequest.Create(uri);
  Function must accept AsyncEnumerator as
wr.BeginGetResponse(ae.End(), null);
  one of the parameters (order unimportant)
  yield return 1;
WebResponseresp = wr.EndGetResponse(
ae.DequeueAsyncResult());
  // use response
}
Callback
internal IEnumerator<Int32> GetFile(
  AsyncEnumerator ae, string uri)
{
  WebRequest wr = WebRequest.Create(uri);
  wr.BeginGetResponse(ae.End(), null);
yieldthe asyncBeginXXX() methods
  Call return 1;
WebResponseresp = wr.EndGetResponse(
  Pass ae.End() as callback parameter
ae.DequeueAsyncResult());
  // use response
}
Yield
internal IEnumerator<Int32> GetFile(
  AsyncEnumerator ae, string uri)
{
  WebRequest wr = WebRequest.Create(uri);
  wr.BeginGetResponse(ae.End(), null);
  yield return 1;
WebResponseresp = wr.EndGetResponse(
  Now yield return the number of pending
  asynchronous operations
ae.DequeueAsyncResult());
  // use response
}
Wait & Process
internal IEnumerator<Int32> GetFile(
  AsyncEnumerator ae, string uri)
{
  WebRequest wr = WebRequest.Create(uri);
  wr.BeginGetResponse(ae.End(), null);
  yield return 1;    Call the asyncEndXXX() methods
  WebResponse resp = wr.EndGetResponse(
    ae.DequeueAsyncResult());
  // use response    Pass ae.DequeueAsyncResult() as parameter

}
Usage
 Init the enumerator
  – var ae = new AsyncEnumerator();
 Use it, passing itself as a parameter
  – ae.Execute(GetFile(
      ae, “http://nesteruk.org”));
Exception Handling
 Break out of function
  – try {
      resp = wr.EndGetResponse(
        ae.DequeueAsyncResult());
    } catch (WebException e) {
      // process e
      yield break;
    }
 Propagate a parameter
Discard Groups
 Sometimes, you want to ignore the result of
 some calls
   E.g., you already got the data elsewhere
 To discard a group of calls
   Use overloaded End(…) methods to specify
     Group number
     Cleanup delegate
   Call DiscardGroup(…) with group number
Cancellation
 External code can cancel the iterator
  – ae.Cancel(…)
 Or specify a timeout
  – ae.SetCancelTimeout(…)
 Check whether iterator is cancelled with
  – ae.IsCanceled(…)
    just call yield break if it is
Part II

Parallel Extensions to .NET
Framework TPL and PLINQ
Parallelization
 Algorithms vary

    (e.g., matrix multiplication)
    Some not so
    (e.g., matrix inversion)
    Some not at all

 parallelize them
Parallel Extensions to .NET Framework (PFX)
 A library for parallelization
 Consists of
    Task Parallel Library
    Parallel LINQ (PLINQ)
 Currently in CTP stage
 Maybe in .NET 4.0?
Task Parallel Library Features
 System.Linq
    Parallel LINQ
 System.Theading
    Implicit parallelism (Parallel.Xxx)
 System.Threading.Collections
    Thread-safe stack and queue
 System.Threading.Tasks
    Task manager, tasks, futures
System.Threading
 Implicit               Parallel.For | ForEach
 parallelization
 (Parallel.For and      LazyInit<T>
 ForEach)               WriteOnce<T>
 Aggregate
                        AggregateException
 exceptions
 Other useful classes
                        Other goodies 
Parallel.For
 Parallelizes a for loop
 Instead of

 for (int i = 0; i < 10; ++i) { … }

 We write

 Parallel.For(0, 10, i => { … });
Parallel.For Overloads
 Step size
 ParallelState for cancelation
 Thread-local initialization
 Thread-local finalization
 References to a TaskManager
 Task creation options
Parallel.ForEach
 Same features as Parallel.For except
    No counters or steps
 Takes an IEnumerable<T> 
Cancelation
 Parallel.For takes an Action<Int32>
 delegate
 Can also take an
 Action<Int32, ParallelState>
   ParallelState keeps track of the state of parallel
   execution
   ParallelState.Stop() stops execution in all threads
Parallel.For Exceptions
 The AggregateException class holds all
 exceptions thrown
 Created even if only one thread throws
 Used by both Parallel.Xxx and PLINQ
 Original exceptions stored in
 InnerExceptions property.
LazyInit<T>
 Lazy initialization of a single variable
 Options
  – AllowMultipleExecution
    Init function can be called by many threads, only
    one value published
  – EnsureSingleExecution
    Init function executed only once
  – ThreadLocal
    One init call & value per thread
WriteOnce<T>
 Single-assignment structure
 Just like Nullable:
   HasValue
   Value
 Also try methods
   TryGetValue
   TrySetValue
Futures
 A future is the name of a value that will
 eventually be produced by a computation
 Thus, we can decide what to do with the
 value before we know it
Futures of T
• Future is a factory
• Future<T> is the actual future (and also has
  factory methods)
  To make a future
  – var f = Future.Create(() => g());
  To use a future
    Get f.Value
    The accessor does an async computation
Tasks & TaskManager
 A better Thread+ThreadPool combination
 TaskManager
   A very clever thread pool :)
   Adjusts worker threads to # of CPUs/cores
   Keeps all cores busy
 Task
   A unit of work
   May (or may not) run concurrently
 http://channel9.msdn.com/posts/DanielMoth/Parall
 elFX-Task-and-friends/
Task
 Just like a future, a task takes an Action<T>
  – Task t = Task.Create(DoSomeWork);
    Overloads exist :)
 Fires off immediately. To wait on completion
  – t.Wait();
 Unlike the thread pool, task manager will use
 as many threads as there are cores
Parallel LINQ (PLINQ)
 Parallel evaluation in
    LINQ to Objects
    LINQ to XML
 Features
    IParallelEnumerable<T>
    ParallelEnumerable.AsParallel static
    method
Example
IEnumerable<T> data = ...;
var q = data.AsParallel()
  .Where(x => p(x))
  .Orderby(x => k(x))
  .Select(x => f(x));

foreach (var e in q)
  a(e);
Part III

Interprocess communication with
PureMPI.NET
Message Passing Interface
 An API for general-purpose IPC
 Works across cores & machines
 C++ and Fortran
   Some Intel libraries support explicitly
 http://www.mcs.anl.gov/research/projects/m
 pich2/
PureMPI.NET
 A free library available at http://purempi.net
 Uses WCF endpoints for communication
 Uses MPI syntax
 Features
   A library DLL for WCF functionality
   An EXE for easy deployment over network
How it works
 Your computers run a service that connects
 them together
 Your program exposes WCF endpoints
 You use the MPI interfaces to communicate
Communicator & Rank
 A communicator is a group of computers
   In most scenarios, you would have one group
   MPI_COMM_WORLD

 comm
   Useful for determine whether we are the
Main
static void Main(string[] args)
{                           MPIEnvironment           app.config

  using (ProcessorGroup processors =
    new ProcessorGroup("MPIEnvironment",
                       MpiProcess))
  {                     Run MpiProcess on all machines

    processors.Start(); Start each one
    processors.WaitForCompletion(); Wait on all
  }
}
Sending & Receiving
 Blocking or non-blocking methods
   Send/Receive (blocking)
   Begin|End Send/Receive (async)
   Invoked on the comm
Send/Receive
static void MpiProcess(IDictionary<string, Comm> comms)
{              Get a default comm from dictionary
  Comm comm = comms["MPI_COMM_WORLD"];
  if (comm.Rank == 0)
  {                 Get a message from 1 (blocking)
    string msg = comm.Receive<string>(1, string.Empty);
    Console.WriteLine("Got " + msg);
  }
  else if (comm.Rank == 1)
  {
    comm.Send(0, string.Empty, "Hello");
  }           Send a message to 0 (also blocking)
}
Extras
 Can use async ops
 Can send to all (Broadcast)
 Can distribute work and then collect it
 (Gather/Scatter)
Thank You!

.Net Multithreading and Parallelization

  • 1.
    Multithreading and Parallelization Dmitri Nesteruk dmitrinesteruk@gmail.com | http://nesteruk.org/seminars
  • 2.
    Agenda Overview Multithreading PowerThreading (AsyncEnumerator) Multi-core parallelization Parallel Extensions to .NET Framework Multi-computer parallelization PureMPI.NET
  • 3.
    Why now? Manycoreparadigm shift CPU speeds reach production challenges (not at the limit yet) growth Processor features Hyper-threading SIMD
  • 4.
    CPU Scope Past:more Yesterday transistors per chip 1x-core Present: more cores per chip Today 2x-core norm Future: even more 4x- cores per chip; Tomorrow NUMA & other 32x-core? specialties
  • 5.
    Machine Scope Mostclients are concerned with Machine one-machine use Clustering helps Cluster leverage performance Clouds Cloud
  • 6.
    Multithreading vs. Parallelization Multithreading Using threads/thread pool to perform async operations Explicit (# of threads known) Parallelization Implicit parallelization No explicit thread operation
  • 7.
    Ways to Parallelize/Multithread System.Threading Managed Parr. Extensions Libraries OpenMP Unmanaged Libraries GPGPU Specialized FPGA
  • 8.
    Managed System.Threading Libraries Parallel Extensions (TPL + PLINQ) PowerThreading Languages/frameworks Sing#, CCR Remoting, WCF, MPI.NET, PureMPI.NET, etc. Use over many machines
  • 9.
    Unmanaged OpenMP –#pragma directives in C++ code Intel multi-core libraries Threading Building Blocks (low-level) Integrated Performance Primitives Math Kernel Library (also has MPI support) MPI, PVM, etc. Use over many machines
  • 10.
    Specialized Ex. (IntrinsicParallelization) GPU Computation (GPGPU) Calculations on graphic card Uses programmable pixel shaders See, e.g., NVidia CUDA, GPGPU.org FPGA Hardware-specific solutions E.g., in-socket accelerators Requires HDL programming & custom hardware
  • 11.
    Part I Multithreading: alook at AsyncEnumerator
  • 12.
    Multithreading Goals Do stuff concurrently Preserve safety/consistency Tools Threads ThreadPool Synchronization objects Framework async APIs
  • 13.
    A Look atDelegates Making delegate for function is easy Given void a() { … } – ThreadStart del = a; Given void a(int n) { … } – Action<int> del = a; Given float a(int n, double m) {…} – Func<int, double, float> del = a; Otherwise, make your own!
  • 14.
    Delegate Methods Invoke() Synchronous, blocks your thread  BeginInvoke Executes in ThreadPool Returns IAsyncResult EndInvoke Waits for completion Takes the IAsyncResult from BeginInvoke
  • 15.
    Usage Fire andforget – del.BeginInvoke(null, null); Fire, and wait until done – IAsyncResult ar = del.BeginInvoke(null,null); … del.EndInvoke(ar); Fire, and call a function when done – del.BeginInvoke(firedWhenDone, null); Callback parameter
  • 16.
    WaitOne and WaitAll To wait until either delegate completes – WaitHandle.WaitOne( new ThreadStart[] { ar1.AsyncWaitHandle, ar2.AsyncWaitHandle }); // wait until either completes To wait until all delegates complete Use WaitAll instead of WaitOne – [MTAThread]-specific, use Pulse & Wait instead
  • 17.
    Example Execute a() andb() in parallel; wait on both ThreadStart delA = a; ThreadStart delB = b; IAsyncResult arA = delA.BeginInvoke(null, null); IAsyncResult arB = delB.BeginInvoke(null, null); WaitHandle.WaitAll(new [] { arA.AsyncWaitHandle, arB.AsyncWaitHandle });
  • 18.
    LINQ Example Execute a()and b() in parallel; wait on both WaitHandle.WaitAll( new [] { a, b } Implicitly make an array of delegates .Select (f =>f.BeginInvoke(null,null) Call each delegate .AsyncWaitHandle) .ToArray()); Get a wait handle of each Convert from IEnumerable to array
  • 19.
    Asynchronous Programming Model(APM) Basic goal – IAsyncResult ar = del.BeginXXX(null,null); … del.EndXXX(ar); Supported by Framework classes, e.g., – FileStream – WebRequest
  • 20.
    Difficulties Asynccalls do not always succeed Timeout Exceptions Cancelation Results in too many functions/anonymous delegates Async workflow code becomes difficult to read
  • 21.
    PowerThreading A freelibrary from Resource locks Wintellect (Jeffrey ReaderWriterGate Richter) Async. prog. model Get it at AsyncEnumerator wintellect.com SyncGate Other features Also check out IO PowerCollections State manager NumaInformation :)
  • 22.
    AsyncEnumerator Simplifies APMprogramming No need to manually manage IAsyncResult cookies Fewer functions, cleaner code
  • 23.
    Usage patterns 1async op → process X async ops → process all X async ops → process each one as it completes X async ops → process some, discard the rest X async ops → process some until cancellation/timeout occurs, discard the rest
  • 24.
    AsyncEnumerator Basics Hasthree methods Execute(IEnumerator<Int32>) BeginExecute EndExecute Also exists as AsyncEnumerator<T> when a return value is required
  • 25.
    Inside the Function internalIEnumerator<Int32> GetFile( AsyncEnumerator ae, string uri) { WebRequest wr = WebRequest.Create(uri); wr.BeginGetResponse(ae.End(), null); yield return 1; WebResponse resp = wr.EndGetResponse( ae.DequeueAsyncResult()); // use response }
  • 26.
    Signature internal IEnumerator<Int32> GetFile( AsyncEnumerator ae, string uri) { Function must return IEnumerator<Int32> WebRequestwr = WebRequest.Create(uri); Function must accept AsyncEnumerator as wr.BeginGetResponse(ae.End(), null); one of the parameters (order unimportant) yield return 1; WebResponseresp = wr.EndGetResponse( ae.DequeueAsyncResult()); // use response }
  • 27.
    Callback internal IEnumerator<Int32> GetFile( AsyncEnumerator ae, string uri) { WebRequest wr = WebRequest.Create(uri); wr.BeginGetResponse(ae.End(), null); yieldthe asyncBeginXXX() methods Call return 1; WebResponseresp = wr.EndGetResponse( Pass ae.End() as callback parameter ae.DequeueAsyncResult()); // use response }
  • 28.
    Yield internal IEnumerator<Int32> GetFile( AsyncEnumerator ae, string uri) { WebRequest wr = WebRequest.Create(uri); wr.BeginGetResponse(ae.End(), null); yield return 1; WebResponseresp = wr.EndGetResponse( Now yield return the number of pending asynchronous operations ae.DequeueAsyncResult()); // use response }
  • 29.
    Wait & Process internalIEnumerator<Int32> GetFile( AsyncEnumerator ae, string uri) { WebRequest wr = WebRequest.Create(uri); wr.BeginGetResponse(ae.End(), null); yield return 1; Call the asyncEndXXX() methods WebResponse resp = wr.EndGetResponse( ae.DequeueAsyncResult()); // use response Pass ae.DequeueAsyncResult() as parameter }
  • 30.
    Usage Init theenumerator – var ae = new AsyncEnumerator(); Use it, passing itself as a parameter – ae.Execute(GetFile( ae, “http://nesteruk.org”));
  • 31.
    Exception Handling Breakout of function – try { resp = wr.EndGetResponse( ae.DequeueAsyncResult()); } catch (WebException e) { // process e yield break; } Propagate a parameter
  • 32.
    Discard Groups Sometimes,you want to ignore the result of some calls E.g., you already got the data elsewhere To discard a group of calls Use overloaded End(…) methods to specify Group number Cleanup delegate Call DiscardGroup(…) with group number
  • 33.
    Cancellation External codecan cancel the iterator – ae.Cancel(…) Or specify a timeout – ae.SetCancelTimeout(…) Check whether iterator is cancelled with – ae.IsCanceled(…) just call yield break if it is
  • 34.
    Part II Parallel Extensionsto .NET Framework TPL and PLINQ
  • 35.
    Parallelization Algorithms vary (e.g., matrix multiplication) Some not so (e.g., matrix inversion) Some not at all parallelize them
  • 36.
    Parallel Extensions to.NET Framework (PFX) A library for parallelization Consists of Task Parallel Library Parallel LINQ (PLINQ) Currently in CTP stage Maybe in .NET 4.0?
  • 37.
    Task Parallel LibraryFeatures System.Linq Parallel LINQ System.Theading Implicit parallelism (Parallel.Xxx) System.Threading.Collections Thread-safe stack and queue System.Threading.Tasks Task manager, tasks, futures
  • 38.
    System.Threading Implicit Parallel.For | ForEach parallelization (Parallel.For and LazyInit<T> ForEach) WriteOnce<T> Aggregate AggregateException exceptions Other useful classes Other goodies 
  • 39.
    Parallel.For Parallelizes afor loop Instead of for (int i = 0; i < 10; ++i) { … } We write Parallel.For(0, 10, i => { … });
  • 40.
    Parallel.For Overloads Stepsize ParallelState for cancelation Thread-local initialization Thread-local finalization References to a TaskManager Task creation options
  • 41.
    Parallel.ForEach Same featuresas Parallel.For except No counters or steps Takes an IEnumerable<T> 
  • 42.
    Cancelation Parallel.For takesan Action<Int32> delegate Can also take an Action<Int32, ParallelState> ParallelState keeps track of the state of parallel execution ParallelState.Stop() stops execution in all threads
  • 43.
    Parallel.For Exceptions TheAggregateException class holds all exceptions thrown Created even if only one thread throws Used by both Parallel.Xxx and PLINQ Original exceptions stored in InnerExceptions property.
  • 44.
    LazyInit<T> Lazy initializationof a single variable Options – AllowMultipleExecution Init function can be called by many threads, only one value published – EnsureSingleExecution Init function executed only once – ThreadLocal One init call & value per thread
  • 45.
    WriteOnce<T> Single-assignment structure Just like Nullable: HasValue Value Also try methods TryGetValue TrySetValue
  • 46.
    Futures A futureis the name of a value that will eventually be produced by a computation Thus, we can decide what to do with the value before we know it
  • 47.
    Futures of T •Future is a factory • Future<T> is the actual future (and also has factory methods) To make a future – var f = Future.Create(() => g()); To use a future Get f.Value The accessor does an async computation
  • 48.
    Tasks & TaskManager A better Thread+ThreadPool combination TaskManager A very clever thread pool :) Adjusts worker threads to # of CPUs/cores Keeps all cores busy Task A unit of work May (or may not) run concurrently http://channel9.msdn.com/posts/DanielMoth/Parall elFX-Task-and-friends/
  • 49.
    Task Just likea future, a task takes an Action<T> – Task t = Task.Create(DoSomeWork); Overloads exist :) Fires off immediately. To wait on completion – t.Wait(); Unlike the thread pool, task manager will use as many threads as there are cores
  • 50.
    Parallel LINQ (PLINQ) Parallel evaluation in LINQ to Objects LINQ to XML Features IParallelEnumerable<T> ParallelEnumerable.AsParallel static method
  • 51.
    Example IEnumerable<T> data =...; var q = data.AsParallel() .Where(x => p(x)) .Orderby(x => k(x)) .Select(x => f(x)); foreach (var e in q) a(e);
  • 52.
  • 53.
    Message Passing Interface An API for general-purpose IPC Works across cores & machines C++ and Fortran Some Intel libraries support explicitly http://www.mcs.anl.gov/research/projects/m pich2/
  • 54.
    PureMPI.NET A freelibrary available at http://purempi.net Uses WCF endpoints for communication Uses MPI syntax Features A library DLL for WCF functionality An EXE for easy deployment over network
  • 55.
    How it works Your computers run a service that connects them together Your program exposes WCF endpoints You use the MPI interfaces to communicate
  • 56.
    Communicator & Rank A communicator is a group of computers In most scenarios, you would have one group MPI_COMM_WORLD comm Useful for determine whether we are the
  • 57.
    Main static void Main(string[]args) { MPIEnvironment app.config using (ProcessorGroup processors = new ProcessorGroup("MPIEnvironment", MpiProcess)) { Run MpiProcess on all machines processors.Start(); Start each one processors.WaitForCompletion(); Wait on all } }
  • 58.
    Sending & Receiving Blocking or non-blocking methods Send/Receive (blocking) Begin|End Send/Receive (async) Invoked on the comm
  • 59.
    Send/Receive static void MpiProcess(IDictionary<string,Comm> comms) { Get a default comm from dictionary Comm comm = comms["MPI_COMM_WORLD"]; if (comm.Rank == 0) { Get a message from 1 (blocking) string msg = comm.Receive<string>(1, string.Empty); Console.WriteLine("Got " + msg); } else if (comm.Rank == 1) { comm.Send(0, string.Empty, "Hello"); } Send a message to 0 (also blocking) }
  • 60.
    Extras Can useasync ops Can send to all (Broadcast) Can distribute work and then collect it (Gather/Scatter)
  • 61.