ORNL is managed by UT-Battelle
for the US Department of Energy
UCX: An Open Source
Framework for HPC
Network APIs and
Beyond
Presented by: Pavel Shamis / Pasha
2
UCX: An Open Source Framework for HPC
Network APIs and Beyond
Co-Design Collaboration
Collaborative Effort
Industry, National Laboratories and
Academia
The Next Generation
HPC Communication
Framework
3
UCX: An Open Source Framework for HPC
Network APIs and Beyond
Challenges
•  Performance Portability (across various
interconnects)
–  Collaboration between industry and research institutions
•  …but mostly industry (because they built the hardware)
•  Maintenance
–  Maintaining a network stack is time consuming and
expensive
–  Industry have resources and strategic interest for this
•  Extendibility
–  MPI+X+Y ?
–  Exascale programming environment is an ongoing debate
4
UCX: An Open Source Framework for HPC
Network APIs and Beyond
Challenges (CORAL)
12 SC’14  Summit  - Bland Do Not Release Prior to Monday, Nov. 17, 2014
How does Summit compare to Titan
Feature Summit Titan
Application Performance 5-10x Titan Baseline
Number of Nodes ~3,400 18,688
Node performance > 40 TF 1.4 TF
Memory per Node >512 GB (HBM + DDR4) 38GB (GDDR5+DDR3)
NVRAM per Node 800 GB 0
Node Interconnect NVLink (5-12x PCIe 3) PCIe 2
System Interconnect
(node injection bandwidth)
Dual Rail EDR-IB (23 GB/s) Gemini (6.4 GB/s)
Interconnect Topology Non-blocking Fat Tree 3D Torus
Processors IBM POWER9
NVIDIA Volta™
AMD  Opteron™
NVIDIA  Kepler™
File System 120 PB,  1  TB/s,  GPFS™ 32 PB, 1 TB/s, Lustre®
Peak power consumption 10 MW 9 MW
5
UCX: An Open Source Framework for HPC
Network APIs and Beyond
UCX – Unified Communication X
Framework
•  Unified
–  Network API for multiple network architectures that target
HPC programing models and libraries
•  Communication
–  How to move data from location in memory A to location
in memory B considering multiple types of memories
•  Framework
–  A collection of libraries and utilities for HPC network
programmers
6
UCX: An Open Source Framework for HPC
Network APIs and Beyond
History
MXM
●  Developed by Mellanox Technologies
●  HPC communication library for InfiniBand
devices and shared memory
●  Primary focus: MPI, PGAS
PAMI
●  Developed by IBM on BG/Q, PERCS, IB
VERBS
●  Network devices and shared memory
●  MPI, OpenSHMEM, PGAS, CHARM++, X10
●  C++ components
●  Aggressive multi-threading with contexts
●  Active Messages
●  Non-blocking collectives with hw accleration
support
Decades of community and
industry experience in
development of HPC software
UCCS
●  Developed by ORNL, UH, UTK
●  Originally based on Open MPI BTL and
OPAL layers
●  HPC communication library for InfiniBand,
Cray Gemini/Aries, and shared memory
●  Primary focus: OpenSHMEM, PGAS
●  Also supports: MPI
7
UCX: An Open Source Framework for HPC
Network APIs and Beyond
What we are doing differently…
•  UCX consolidates multiple industry and academic
efforts
–  Mellanox MXM, IBM PAMI, ORNL/UTK/UH UCCS, etc.
•  Supported and maintained by industry
–  IBM, Mellanox, NVIDIA, Pathscale
8
UCX: An Open Source Framework for HPC
Network APIs and Beyond
What we are doing differently…
•  Co-design effort between national laboratories,
academia, and industry
Applications: LAMMPS, NWCHEM, etc.
Programming models: MPI, PGAS/Gasnet, etc.
Middleware:
Driver and Hardware
Co-design
9
UCX: An Open Source Framework for HPC
Network APIs and Beyond
UCX
InfiniBand uGNI
Shared
Memory
GPU Memory
Emerging
Interconnects
MPI GasNet PGAS
Task Based
Runtimes
I/O
Transports
Protocols Services
Applications
10
UCX: An Open Source Framework for HPC
Network APIs and Beyond
A Collaboration Efforts
•  Mellanox co-designs network interface and contributes
MXM technology
–  Infrastructure, transport, shared memory, protocols,
integration with OpenMPI/SHMEM, MPICH
•  ORNL co-designs network interface and contributes UCCS
project
–  InfiniBand optimizations, Cray devices, shared memory
•  NVIDIA co-designs high-quality support for GPU devices
–  GPUDirect, GDR copy, etc.
•  IBM co-designs network interface and contributes ideas
and concepts from PAMI
•  UH/UTK focus on integration with their research platforms
11
UCX: An Open Source Framework for HPC
Network APIs and Beyond
Licensing
•  Open Source
–  BSD 3 Clause license
–  Contributor License Agreement – BSD 3 based
12
UCX: An Open Source Framework for HPC
Network APIs and Beyond
UCX Framework Mission
•  Collaboration between industry, laboratories, and academia
•  Create open-source production grade communication framework for HPC applications
•  Enable the highest performance through co-design of software-hardware interfaces
•  Unify industry - national laboratories - academia efforts
Performance oriented
Optimization for low-software overheads
in communication path allows near
native-level performance
Community driven
Collaboration between industry,
laboratories, and academia
Production quality
Developed, maintained, tested, and
used by industry and researcher
community
API
Exposes broad semantics that target
data centric and HPC programming
models and applications
Research
The framework concepts and ideas are
driven by research in academia,
laboratories, and industry
Cross platform
Support for Infiniband, Cray, various
shared memory (x86-64 and Power),
GPUs
Co-design of Exascale Network APIs
13
UCX: An Open Source Framework for HPC
Network APIs and Beyond
Architecture
14
UCX: An Open Source Framework for HPC
Network APIs and Beyond
UCX Framework
UC-S for Services
This framework
provides basic
infrastructure for
component based
programming, memory
management, and
useful system utilities
Functionality:
Platform abstractions,
data structures, debug
facilities.
UC-T for Transport
Low-level API that
expose basic network
operations supported
by underlying
hardware. Reliable,
out-of-order delivery.
Functionality:
Setup and instantiation
of communication
operations.
UC-P for Protocols
High-level API uses
UCT framework to
construct protocols
commonly found in
applications
Functionality:
Multi-rail, device
selection, pending
queue, rendezvous,
tag-matching, software-
atomics, etc.
15
UCX: An Open Source Framework for HPC
Network APIs and Beyond
A High-level Overview
UC-T (Hardware Transports) - Low Level API
RMA, Atomic, Tag-matching, Send/Recv, Active Message
Transport for InfiniBand VERBs
driver
RC UD XRC DCT
Transport for intra-node host memory communication
SYSV POSIX KNEM CMA XPMEM
Transport for
Accelerator Memory
communucation
GPU
Transport for
Gemini/Aries
drivers
GNI
UC-S
(Services)
Common utilities
UC-P (Protocols) - High Level API
Transport selection, cross-transrport multi-rail, fragmentation, operations not supported by hardware
Message Passing API Domain:
tag matching, randevouze
PGAS API Domain:
RMAs, Atomics
Task Based API Domain:
Active Messages
I/O API Domain:
Stream
Utilities
Data
stractures
Hardware
MPICH, Open-MPI, etc.
OpenSHMEM, UPC, CAF, X10,
Chapel, etc.
Parsec, OCR, Legions, etc. Burst buffer, ADIOS, etc.
Applications
UCX
Memory
Management
OFA Verbs Driver Cray Driver OS Kernel Cuda
16
UCX: An Open Source Framework for HPC
Network APIs and Beyond
Preliminary Evaluation ( UCT )
•  Two HP ProLiant DL380p Gen8 servers
•  Intel Xeon E5-2697 2.7GHz CPUs
•  Mellanox SX6036 switch
•  Single-port Mellanox Connect-IB FDR (10.10.5056)
•  Mellanox OFED 2.4-1.0.4. (VERBS)
•  Prototype implementation of Accelerated VERBS (AVERBS)
17
UCX: An Open Source Framework for HPC
Network APIs and Beyond
OpenSHMEM and OSHMEM (OpenMPI)
Put Latency (shared memory)
0.1
1
10
100
1000
8 16 32 64 128 256 512 1KB 2KB 4KB 8KB 16KB 32KB 64KB 128KB256KB512KB 1MB 2MB 4MB
Latency(usec,logscale)
Message Size
OpenSHMEM−UCX (intranode)
OpenSHMEM−UCCS (intranode)
OSHMEM (intranode)
Lower is better
18
UCX: An Open Source Framework for HPC
Network APIs and Beyond
OpenSHMEM and OSHMEM (OpenMPI)
Put Injection Rate
0
2e+06
4e+06
6e+06
8e+06
1e+07
1.2e+07
1.4e+07
8 16 32 64 128 256 512 1KB 2KB 4KB
MessageRate(putoperations/second)
Message Size
OpenSHMEM−UCX (mlx5)
OpenSHMEM−UCCS (mlx5)
OSHMEM (mlx5)
OSHMEM−UCX (mlx5)Higher is better
Connect-IB
19
UCX: An Open Source Framework for HPC
Network APIs and Beyond
OpenSHMEM and OSHMEM (OpenMPI)
GUPs Benchmark
0
0.0002
0.0004
0.0006
0.0008
0.001
0.0012
0.0014
0.0016
0.0018
2 4 6 8 10 12 14 16
GUPS(billionupdatespersecond)
Number of PEs (two nodes)
UCX (mlx5)
OSHMEM (mlx5)
Higher is better
Connect-IB
20
UCX: An Open Source Framework for HPC
Network APIs and Beyond
MPICH - Message rate – Preliminary Results
0
1
2
3
4
5
6
1
2
4
8
16
32
64
128
256
512
1k
2k
4k
8k
16k
32k
64k
128k
256k
512k
1M
2M
4M
MMPS
MPICH/UCX MPICH/MXM
Slide courtesy of Pavan Balaji, ANL - sent to the ucx mailing list
Connect-IB
“non-blocking tag-send”
21
UCX: An Open Source Framework for HPC
Network APIs and Beyond
Where is UCX being used?
•  Upcoming release of Open MPI 2.0
•  Upcoming release of MPICH
•  OpenSHMEM reference implementation
•  PARSEC – runtime used on Scientific Linear
Libraries
22
UCX: An Open Source Framework for HPC
Network APIs and Beyond
What Next ?
•  UCX Consortium !
–  http://www.csm.ornl.gov/newsite/
•  UCX Specification
–  Early draft is available online:
http://www.openucx.org/early-draft-of-ucx-specification-is-here/
•  Production releases
–  MPICH, Open MPI, Open SHMEM(s), Gasnet, and more…
•  Support for more networks and applications and libraries
•  UCX Hackathon 2016 !
–  Will be announced on the mailing list and website
https://github.com/orgs/openucx
WEB: www.openucx.org
Contact: info@openucx.org
Mailing List:
https://elist.ornl.gov/mailman/listinfo/ucx-group
ucx-group@elist.ornl.gov
24
UCX: An Open Source Framework for HPC
Network APIs and Beyond
Acknowledgments
25
UCX: An Open Source Framework for HPC
Network APIs and Beyond
Acknowledgments
•  Thanks to all our partners !
Questions ?
Unified Communication - X
Framework
WEB: www.openucx.org
Contact: info@openucx.org
WE B: https://github.com/orgs/openucx
Mailing List:
https://elist.ornl.gov/mailman/listinfo/ucx-group
ucx-group@elist.ornl.gov

UCX: An Open Source Framework for HPC Network APIs and Beyond

  • 1.
    ORNL is managedby UT-Battelle for the US Department of Energy UCX: An Open Source Framework for HPC Network APIs and Beyond Presented by: Pavel Shamis / Pasha
  • 2.
    2 UCX: An OpenSource Framework for HPC Network APIs and Beyond Co-Design Collaboration Collaborative Effort Industry, National Laboratories and Academia The Next Generation HPC Communication Framework
  • 3.
    3 UCX: An OpenSource Framework for HPC Network APIs and Beyond Challenges •  Performance Portability (across various interconnects) –  Collaboration between industry and research institutions •  …but mostly industry (because they built the hardware) •  Maintenance –  Maintaining a network stack is time consuming and expensive –  Industry have resources and strategic interest for this •  Extendibility –  MPI+X+Y ? –  Exascale programming environment is an ongoing debate
  • 4.
    4 UCX: An OpenSource Framework for HPC Network APIs and Beyond Challenges (CORAL) 12 SC’14  Summit  - Bland Do Not Release Prior to Monday, Nov. 17, 2014 How does Summit compare to Titan Feature Summit Titan Application Performance 5-10x Titan Baseline Number of Nodes ~3,400 18,688 Node performance > 40 TF 1.4 TF Memory per Node >512 GB (HBM + DDR4) 38GB (GDDR5+DDR3) NVRAM per Node 800 GB 0 Node Interconnect NVLink (5-12x PCIe 3) PCIe 2 System Interconnect (node injection bandwidth) Dual Rail EDR-IB (23 GB/s) Gemini (6.4 GB/s) Interconnect Topology Non-blocking Fat Tree 3D Torus Processors IBM POWER9 NVIDIA Volta™ AMD  Opteron™ NVIDIA  Kepler™ File System 120 PB,  1  TB/s,  GPFS™ 32 PB, 1 TB/s, Lustre® Peak power consumption 10 MW 9 MW
  • 5.
    5 UCX: An OpenSource Framework for HPC Network APIs and Beyond UCX – Unified Communication X Framework •  Unified –  Network API for multiple network architectures that target HPC programing models and libraries •  Communication –  How to move data from location in memory A to location in memory B considering multiple types of memories •  Framework –  A collection of libraries and utilities for HPC network programmers
  • 6.
    6 UCX: An OpenSource Framework for HPC Network APIs and Beyond History MXM ●  Developed by Mellanox Technologies ●  HPC communication library for InfiniBand devices and shared memory ●  Primary focus: MPI, PGAS PAMI ●  Developed by IBM on BG/Q, PERCS, IB VERBS ●  Network devices and shared memory ●  MPI, OpenSHMEM, PGAS, CHARM++, X10 ●  C++ components ●  Aggressive multi-threading with contexts ●  Active Messages ●  Non-blocking collectives with hw accleration support Decades of community and industry experience in development of HPC software UCCS ●  Developed by ORNL, UH, UTK ●  Originally based on Open MPI BTL and OPAL layers ●  HPC communication library for InfiniBand, Cray Gemini/Aries, and shared memory ●  Primary focus: OpenSHMEM, PGAS ●  Also supports: MPI
  • 7.
    7 UCX: An OpenSource Framework for HPC Network APIs and Beyond What we are doing differently… •  UCX consolidates multiple industry and academic efforts –  Mellanox MXM, IBM PAMI, ORNL/UTK/UH UCCS, etc. •  Supported and maintained by industry –  IBM, Mellanox, NVIDIA, Pathscale
  • 8.
    8 UCX: An OpenSource Framework for HPC Network APIs and Beyond What we are doing differently… •  Co-design effort between national laboratories, academia, and industry Applications: LAMMPS, NWCHEM, etc. Programming models: MPI, PGAS/Gasnet, etc. Middleware: Driver and Hardware Co-design
  • 9.
    9 UCX: An OpenSource Framework for HPC Network APIs and Beyond UCX InfiniBand uGNI Shared Memory GPU Memory Emerging Interconnects MPI GasNet PGAS Task Based Runtimes I/O Transports Protocols Services Applications
  • 10.
    10 UCX: An OpenSource Framework for HPC Network APIs and Beyond A Collaboration Efforts •  Mellanox co-designs network interface and contributes MXM technology –  Infrastructure, transport, shared memory, protocols, integration with OpenMPI/SHMEM, MPICH •  ORNL co-designs network interface and contributes UCCS project –  InfiniBand optimizations, Cray devices, shared memory •  NVIDIA co-designs high-quality support for GPU devices –  GPUDirect, GDR copy, etc. •  IBM co-designs network interface and contributes ideas and concepts from PAMI •  UH/UTK focus on integration with their research platforms
  • 11.
    11 UCX: An OpenSource Framework for HPC Network APIs and Beyond Licensing •  Open Source –  BSD 3 Clause license –  Contributor License Agreement – BSD 3 based
  • 12.
    12 UCX: An OpenSource Framework for HPC Network APIs and Beyond UCX Framework Mission •  Collaboration between industry, laboratories, and academia •  Create open-source production grade communication framework for HPC applications •  Enable the highest performance through co-design of software-hardware interfaces •  Unify industry - national laboratories - academia efforts Performance oriented Optimization for low-software overheads in communication path allows near native-level performance Community driven Collaboration between industry, laboratories, and academia Production quality Developed, maintained, tested, and used by industry and researcher community API Exposes broad semantics that target data centric and HPC programming models and applications Research The framework concepts and ideas are driven by research in academia, laboratories, and industry Cross platform Support for Infiniband, Cray, various shared memory (x86-64 and Power), GPUs Co-design of Exascale Network APIs
  • 13.
    13 UCX: An OpenSource Framework for HPC Network APIs and Beyond Architecture
  • 14.
    14 UCX: An OpenSource Framework for HPC Network APIs and Beyond UCX Framework UC-S for Services This framework provides basic infrastructure for component based programming, memory management, and useful system utilities Functionality: Platform abstractions, data structures, debug facilities. UC-T for Transport Low-level API that expose basic network operations supported by underlying hardware. Reliable, out-of-order delivery. Functionality: Setup and instantiation of communication operations. UC-P for Protocols High-level API uses UCT framework to construct protocols commonly found in applications Functionality: Multi-rail, device selection, pending queue, rendezvous, tag-matching, software- atomics, etc.
  • 15.
    15 UCX: An OpenSource Framework for HPC Network APIs and Beyond A High-level Overview UC-T (Hardware Transports) - Low Level API RMA, Atomic, Tag-matching, Send/Recv, Active Message Transport for InfiniBand VERBs driver RC UD XRC DCT Transport for intra-node host memory communication SYSV POSIX KNEM CMA XPMEM Transport for Accelerator Memory communucation GPU Transport for Gemini/Aries drivers GNI UC-S (Services) Common utilities UC-P (Protocols) - High Level API Transport selection, cross-transrport multi-rail, fragmentation, operations not supported by hardware Message Passing API Domain: tag matching, randevouze PGAS API Domain: RMAs, Atomics Task Based API Domain: Active Messages I/O API Domain: Stream Utilities Data stractures Hardware MPICH, Open-MPI, etc. OpenSHMEM, UPC, CAF, X10, Chapel, etc. Parsec, OCR, Legions, etc. Burst buffer, ADIOS, etc. Applications UCX Memory Management OFA Verbs Driver Cray Driver OS Kernel Cuda
  • 16.
    16 UCX: An OpenSource Framework for HPC Network APIs and Beyond Preliminary Evaluation ( UCT ) •  Two HP ProLiant DL380p Gen8 servers •  Intel Xeon E5-2697 2.7GHz CPUs •  Mellanox SX6036 switch •  Single-port Mellanox Connect-IB FDR (10.10.5056) •  Mellanox OFED 2.4-1.0.4. (VERBS) •  Prototype implementation of Accelerated VERBS (AVERBS)
  • 17.
    17 UCX: An OpenSource Framework for HPC Network APIs and Beyond OpenSHMEM and OSHMEM (OpenMPI) Put Latency (shared memory) 0.1 1 10 100 1000 8 16 32 64 128 256 512 1KB 2KB 4KB 8KB 16KB 32KB 64KB 128KB256KB512KB 1MB 2MB 4MB Latency(usec,logscale) Message Size OpenSHMEM−UCX (intranode) OpenSHMEM−UCCS (intranode) OSHMEM (intranode) Lower is better
  • 18.
    18 UCX: An OpenSource Framework for HPC Network APIs and Beyond OpenSHMEM and OSHMEM (OpenMPI) Put Injection Rate 0 2e+06 4e+06 6e+06 8e+06 1e+07 1.2e+07 1.4e+07 8 16 32 64 128 256 512 1KB 2KB 4KB MessageRate(putoperations/second) Message Size OpenSHMEM−UCX (mlx5) OpenSHMEM−UCCS (mlx5) OSHMEM (mlx5) OSHMEM−UCX (mlx5)Higher is better Connect-IB
  • 19.
    19 UCX: An OpenSource Framework for HPC Network APIs and Beyond OpenSHMEM and OSHMEM (OpenMPI) GUPs Benchmark 0 0.0002 0.0004 0.0006 0.0008 0.001 0.0012 0.0014 0.0016 0.0018 2 4 6 8 10 12 14 16 GUPS(billionupdatespersecond) Number of PEs (two nodes) UCX (mlx5) OSHMEM (mlx5) Higher is better Connect-IB
  • 20.
    20 UCX: An OpenSource Framework for HPC Network APIs and Beyond MPICH - Message rate – Preliminary Results 0 1 2 3 4 5 6 1 2 4 8 16 32 64 128 256 512 1k 2k 4k 8k 16k 32k 64k 128k 256k 512k 1M 2M 4M MMPS MPICH/UCX MPICH/MXM Slide courtesy of Pavan Balaji, ANL - sent to the ucx mailing list Connect-IB “non-blocking tag-send”
  • 21.
    21 UCX: An OpenSource Framework for HPC Network APIs and Beyond Where is UCX being used? •  Upcoming release of Open MPI 2.0 •  Upcoming release of MPICH •  OpenSHMEM reference implementation •  PARSEC – runtime used on Scientific Linear Libraries
  • 22.
    22 UCX: An OpenSource Framework for HPC Network APIs and Beyond What Next ? •  UCX Consortium ! –  http://www.csm.ornl.gov/newsite/ •  UCX Specification –  Early draft is available online: http://www.openucx.org/early-draft-of-ucx-specification-is-here/ •  Production releases –  MPICH, Open MPI, Open SHMEM(s), Gasnet, and more… •  Support for more networks and applications and libraries •  UCX Hackathon 2016 ! –  Will be announced on the mailing list and website
  • 23.
    https://github.com/orgs/openucx WEB: www.openucx.org Contact: info@openucx.org MailingList: https://elist.ornl.gov/mailman/listinfo/ucx-group ucx-group@elist.ornl.gov
  • 24.
    24 UCX: An OpenSource Framework for HPC Network APIs and Beyond Acknowledgments
  • 25.
    25 UCX: An OpenSource Framework for HPC Network APIs and Beyond Acknowledgments •  Thanks to all our partners !
  • 26.
    Questions ? Unified Communication- X Framework WEB: www.openucx.org Contact: info@openucx.org WE B: https://github.com/orgs/openucx Mailing List: https://elist.ornl.gov/mailman/listinfo/ucx-group ucx-group@elist.ornl.gov