“The Pacific Research Platform:
Building a Distributed Big-Data Machine-Learning
Cyberinfrastructure”
Briefing
Chancellor’s Council
University of California San Diego
May 13, 2019
Dr. Larry Smarr
Director, California Institute for Telecommunications and Information Technology
Harry E. Gruber Professor,
Dept. of Computer Science and Engineering
Jacobs School of Engineering, UCSD
http://lsmarr.calit2.net
The Unrelenting Exponential Decrease in Cost of Generating Data
Has Led to the Need for a Big Data Cyberinfrastructure
One Million
Times
Cheaper
UC San Diego’s Calit2 & SDSC Have Pioneered Big-Data Cyberinfrastructure for 17 Years
2002-2009: OptIPuter and Quartzite
OptIPuter
$13.5M
PI Smarr,
Co-PI DeFanti
Co-PI Papadopoulos, Ellisman
2002-2009
Quartzite
$1.2M
PI Papadopoulos,
Co-PI Smarr
2004-2007
2013-2015: Creating a “Big Data” Backplane on Campus:
NSF Funded Prism@UCSD and CHERuB
Prism@UCSD, $500,000, Phil Papadopoulos, SDSC, Calit2, PI; Smarr co-PI
CHERuB, $500,000, Mike Norman, SDSC PI
CHERuB
(GDC)
2015-2020: The Pacific Research Platform Connects Campus “Big Data Freeways”
to Create a Regional End-to-End Science-Driven “Big Data Superhighway” System
NSF CC*DNI Grant
$6M 10/2015-10/2020
PI: Larry Smarr, UC San Diego Calit2
Co-PIs:
• Camille Crittenden, UC Berkeley CITRIS,
• Tom DeFanti, UC San Diego Calit2/QI,
• Philip Papadopoulos, UCSD SDSC,
• Frank Wuerthwein, UCSD Physics and SDSC
Letters of Commitment from:
• 50 Researchers from 15 Campuses
• 32 IT/Network Organization Leaders
Source: John Hess, CENIC
UCOP CIO Tom Andriola
Provided Funds and ITLC Support
for Using Ten UC Campuses
For Advanced Technology Testing
2017-2020: CHASE-CI Adds
Machine-Learning to the Data-Science Community Cyberinfrastructure
Caltech
UCB
UCI UCR
UCSD
UCSC
Stanford
MSU
UCM
SDSU
NSF Grant for 256 High Speed “Cloud” GPUs
For 32 ML Faculty & Their Students at 10 Campuses
To Train AI Algorithms on Big Data
PRP Engineers Designed and Built Several Generations
of Optical-Fiber Big-Data Flash I/O Network Appliances (FIONAs)
UCSD-Designed FIONAs Solved the Disk-to-Disk Data Transfer Problem
at Near Full Speed on Best-Effort 10G, 40G and 100G Networks
FIONAs Designed by UCSD’s Phil Papadopoulos, John Graham,
Joe Keefe, and Tom DeFanti
FIONette—
1G, $250
Used for
Training 50
Engineers in
2018-2019
Two FIONA DTNs at UC Santa Cruz: 40G & 100G
Up to 200 TeraByte Rotating Storage
Add Up to 8 Nvidia GPUs Per FIONA
To Add Machine Learning Capability
Over 100 FIONAs Now Deployed on PRP
48 GPUs for
OSG Applications
UCSD Adding >350 Game GPUs to Data Sciences Cyberinfrastructure -
Devoted to Data Analytics and Machine Learning
SunCAVE 70 GPUs
FIONA with
8-Game GPUs
32 GPUs
for Research
ECE Dept
CHASE-CI Grant :
96 GPUs at UCSD
for Training AI Algorithms on Big Data
Plus 288 64-bit GPUs
On SDSC’s Comet
108 GPUs
for Students
Toward an “AI University”
Original PRP
CENIC/PW Link
2018-2019: National-Scale Pilot -
Using CENIC & Internet2 to Connect Quilt Regional R&E Networks
Announced May 8, 2018
Internet2 Global Summit
“Towards
The NRP”
3-Year Grant
Funded
by NSF
$2.5M
October 2018
PI Smarr
Co-PIs Altintas
Papadopoulos
Wuerthwein
Rosing
NRP Pilot
NSF CENIC Link
2018-2019: PRP Game Changer!
Using Kubernetes to Orchestrate Containers Across the PRP
“Kubernetes is a way of stitching together
a collection of machines into,
basically, a big computer,”
--Craig Mcluckie, Google
and now CEO and Founder of Heptio
"Everything at Google runs in a container."
--Joe Beda,Google
100G NVMe 6.4TB
Caltech
40G 192TB
UCSF
40G 160TB HPWREN
40G 160TB
4 FIONA8s
Calit2/UCI
35 FIONA2s
12 FIONA8s
2x40G 160TB HPWREN
UCSD
100G Epyc NVMe
100G Gold NVMe
8 FIONA8s + 5 FIONA8s
SDSC @ UCSD
1 FIONA8
40G 160TB
UCR 40G 160TB
USC
100G NVMe 6.4TB
2x40G 160TB
UCLA
1 FIONA8
40G 160TB
Stanford U
2 FIONA8s
40G 192TB
UCSB
4.5 FIONA8s
100G NVMe 6.4TB
40G 160TB
UCSC
California-Connected by CENIC
PRP Kubernetes “Nautilus” Hypercluster
10 FIONA2s
1 FIONA8
40G 160TB
UCM
100Gb/s HPR
13 Campus Nautilus Cluster:
3300 CPU Cores 82 Hosts
~4 PB Storage
>350 GPUs: >30M core/hrs/day
40G 160TB HPWREN
100G NVMe 6.4TB
1 FIONA8 2 FIONA4s
FPGAs + 2PB BeeGFS
SDSU
PRP Disks
10G 3TB
CSUSB
Minority Serving Institution
CHASE-CI
100G 48TB
NPS
CENIC/PW Link
40G 3TB
U Hawaii
40G 160TB
NCAR-WY
40G 192TB
UWashington
100G FIONA
I2 Chicago
100G FIONA
I2 Kansas City
10G FIONA1
40G FIONA
UIC
100G FIONA
I2 NYC
40G 3TB
StarLight
United States PRP/TNRP Nautilus Hypercluster
Now Connects 3 More Regionals and 3 Internet2 Sites
Global PRP Nautilus Hypercluster Is Rapidly Adding International Partners
Beyond Our Original Partner in Amsterdam
PRP
Guam
Australia
Korea
Singapore
Netherlands
10G 35TB
UvA40G FIONA6
40G 28TB
KISTI
10G (coming)
U of Guam
100G 35TB
U of Queensland
Transoceanic Nodes Show Distance is Not the Barrier
to Above 5Gb/s Disk-to-Disk Performance
PRP’s Current
International
Partners
PRP is Science-Driven:
Connecting Multi-Campus Application Teams and Devices
Earth
Sciences
UC San Diego UCBerkeley UC Merced
Director: F. Martin Ralph
Big Data Collaboration with:
Source: Scott Sellers, PhD CHRS; Postdoc CW3E
Collaboration on Atmospheric Water in the West
Between UC San Diego and UC Irvine
Director, Soroosh Sorooshian, UCSD
Calit2’s FIONA
SDSC’s COMET
Calit2’s FIONA
Pacific Research Platform (10-100 Gb/s)
GPUsGPUs
Complete Workflow Time: 19.2 Days52 Minutes!
UC, Irvine UC, San Diego
PRP Sped Up Scott Sellar’s Workflow
by Over 500 Times!
Source: Scott Sellers, US State Dept.
OSG IceCube Usage on PRP (Purple Segment) 3/9/19:
Using 126 GPUs + 142 CPUs + 49 GB RAM
GPU Simulations Needed to Improve Ice Model.
=> Results in Significant Improvement in Pointing Resolution
for Multi-Messenger Astrophysics
IceCube
UCSD’s ITS Adapted PRP FIONA8s
To Support Data Science Courses
Instructional Data Science
Machine Learning Platform:
Instead of Spending
~$20,000/Quarter/Course on
Commercial Clouds:
97 Courses over 6 Quarters 
$4M vs. $240K over 12 Quarters
At least 20,000 Students
Adam Tilghman, ITS
Source: UCSD ITS
The Student GPUs
Have Supported a Broad Set of Courses Across Campus
Source: UCSD ITS
The ITS GPUs
Have Supported Thousands of Students
Source: UCSD ITS
Student GPU Demand Is Variable
Allowing for Other Student Uses
Available to Support:
Independent Study,
For-Credit Research,
External Barter
Source: UCSD ITS
PRP Actively Develops Diversity
• Grants
– 3 Female co-PIs
– 1 Hispanic co-PI
• Campuses
– 8 Minority-Serving Institutions in PRP/CHASE-CI
• Workshops
– NRPII Workshop Steering Committee 80% Female
– Multiple MSI, EPSCoR Focused Workshops Jackson State University
PRP MSI Workshop
Presenting
FIONettes

The Pacific Research Platform: Building a Distributed Big-Data Machine-Learning Cyberinfrastructure

  • 1.
    “The Pacific ResearchPlatform: Building a Distributed Big-Data Machine-Learning Cyberinfrastructure” Briefing Chancellor’s Council University of California San Diego May 13, 2019 Dr. Larry Smarr Director, California Institute for Telecommunications and Information Technology Harry E. Gruber Professor, Dept. of Computer Science and Engineering Jacobs School of Engineering, UCSD http://lsmarr.calit2.net
  • 2.
    The Unrelenting ExponentialDecrease in Cost of Generating Data Has Led to the Need for a Big Data Cyberinfrastructure One Million Times Cheaper
  • 3.
    UC San Diego’sCalit2 & SDSC Have Pioneered Big-Data Cyberinfrastructure for 17 Years 2002-2009: OptIPuter and Quartzite OptIPuter $13.5M PI Smarr, Co-PI DeFanti Co-PI Papadopoulos, Ellisman 2002-2009 Quartzite $1.2M PI Papadopoulos, Co-PI Smarr 2004-2007
  • 4.
    2013-2015: Creating a“Big Data” Backplane on Campus: NSF Funded Prism@UCSD and CHERuB Prism@UCSD, $500,000, Phil Papadopoulos, SDSC, Calit2, PI; Smarr co-PI CHERuB, $500,000, Mike Norman, SDSC PI CHERuB
  • 5.
    (GDC) 2015-2020: The PacificResearch Platform Connects Campus “Big Data Freeways” to Create a Regional End-to-End Science-Driven “Big Data Superhighway” System NSF CC*DNI Grant $6M 10/2015-10/2020 PI: Larry Smarr, UC San Diego Calit2 Co-PIs: • Camille Crittenden, UC Berkeley CITRIS, • Tom DeFanti, UC San Diego Calit2/QI, • Philip Papadopoulos, UCSD SDSC, • Frank Wuerthwein, UCSD Physics and SDSC Letters of Commitment from: • 50 Researchers from 15 Campuses • 32 IT/Network Organization Leaders Source: John Hess, CENIC UCOP CIO Tom Andriola Provided Funds and ITLC Support for Using Ten UC Campuses For Advanced Technology Testing
  • 6.
    2017-2020: CHASE-CI Adds Machine-Learningto the Data-Science Community Cyberinfrastructure Caltech UCB UCI UCR UCSD UCSC Stanford MSU UCM SDSU NSF Grant for 256 High Speed “Cloud” GPUs For 32 ML Faculty & Their Students at 10 Campuses To Train AI Algorithms on Big Data
  • 7.
    PRP Engineers Designedand Built Several Generations of Optical-Fiber Big-Data Flash I/O Network Appliances (FIONAs) UCSD-Designed FIONAs Solved the Disk-to-Disk Data Transfer Problem at Near Full Speed on Best-Effort 10G, 40G and 100G Networks FIONAs Designed by UCSD’s Phil Papadopoulos, John Graham, Joe Keefe, and Tom DeFanti FIONette— 1G, $250 Used for Training 50 Engineers in 2018-2019 Two FIONA DTNs at UC Santa Cruz: 40G & 100G Up to 200 TeraByte Rotating Storage Add Up to 8 Nvidia GPUs Per FIONA To Add Machine Learning Capability Over 100 FIONAs Now Deployed on PRP
  • 8.
    48 GPUs for OSGApplications UCSD Adding >350 Game GPUs to Data Sciences Cyberinfrastructure - Devoted to Data Analytics and Machine Learning SunCAVE 70 GPUs FIONA with 8-Game GPUs 32 GPUs for Research ECE Dept CHASE-CI Grant : 96 GPUs at UCSD for Training AI Algorithms on Big Data Plus 288 64-bit GPUs On SDSC’s Comet 108 GPUs for Students Toward an “AI University”
  • 9.
    Original PRP CENIC/PW Link 2018-2019:National-Scale Pilot - Using CENIC & Internet2 to Connect Quilt Regional R&E Networks Announced May 8, 2018 Internet2 Global Summit “Towards The NRP” 3-Year Grant Funded by NSF $2.5M October 2018 PI Smarr Co-PIs Altintas Papadopoulos Wuerthwein Rosing NRP Pilot NSF CENIC Link
  • 10.
    2018-2019: PRP GameChanger! Using Kubernetes to Orchestrate Containers Across the PRP “Kubernetes is a way of stitching together a collection of machines into, basically, a big computer,” --Craig Mcluckie, Google and now CEO and Founder of Heptio "Everything at Google runs in a container." --Joe Beda,Google
  • 11.
    100G NVMe 6.4TB Caltech 40G192TB UCSF 40G 160TB HPWREN 40G 160TB 4 FIONA8s Calit2/UCI 35 FIONA2s 12 FIONA8s 2x40G 160TB HPWREN UCSD 100G Epyc NVMe 100G Gold NVMe 8 FIONA8s + 5 FIONA8s SDSC @ UCSD 1 FIONA8 40G 160TB UCR 40G 160TB USC 100G NVMe 6.4TB 2x40G 160TB UCLA 1 FIONA8 40G 160TB Stanford U 2 FIONA8s 40G 192TB UCSB 4.5 FIONA8s 100G NVMe 6.4TB 40G 160TB UCSC California-Connected by CENIC PRP Kubernetes “Nautilus” Hypercluster 10 FIONA2s 1 FIONA8 40G 160TB UCM 100Gb/s HPR 13 Campus Nautilus Cluster: 3300 CPU Cores 82 Hosts ~4 PB Storage >350 GPUs: >30M core/hrs/day 40G 160TB HPWREN 100G NVMe 6.4TB 1 FIONA8 2 FIONA4s FPGAs + 2PB BeeGFS SDSU PRP Disks 10G 3TB CSUSB Minority Serving Institution CHASE-CI 100G 48TB NPS
  • 12.
    CENIC/PW Link 40G 3TB UHawaii 40G 160TB NCAR-WY 40G 192TB UWashington 100G FIONA I2 Chicago 100G FIONA I2 Kansas City 10G FIONA1 40G FIONA UIC 100G FIONA I2 NYC 40G 3TB StarLight United States PRP/TNRP Nautilus Hypercluster Now Connects 3 More Regionals and 3 Internet2 Sites
  • 13.
    Global PRP NautilusHypercluster Is Rapidly Adding International Partners Beyond Our Original Partner in Amsterdam PRP Guam Australia Korea Singapore Netherlands 10G 35TB UvA40G FIONA6 40G 28TB KISTI 10G (coming) U of Guam 100G 35TB U of Queensland Transoceanic Nodes Show Distance is Not the Barrier to Above 5Gb/s Disk-to-Disk Performance PRP’s Current International Partners
  • 14.
    PRP is Science-Driven: ConnectingMulti-Campus Application Teams and Devices Earth Sciences UC San Diego UCBerkeley UC Merced
  • 15.
    Director: F. MartinRalph Big Data Collaboration with: Source: Scott Sellers, PhD CHRS; Postdoc CW3E Collaboration on Atmospheric Water in the West Between UC San Diego and UC Irvine Director, Soroosh Sorooshian, UCSD
  • 16.
    Calit2’s FIONA SDSC’s COMET Calit2’sFIONA Pacific Research Platform (10-100 Gb/s) GPUsGPUs Complete Workflow Time: 19.2 Days52 Minutes! UC, Irvine UC, San Diego PRP Sped Up Scott Sellar’s Workflow by Over 500 Times! Source: Scott Sellers, US State Dept.
  • 17.
    OSG IceCube Usageon PRP (Purple Segment) 3/9/19: Using 126 GPUs + 142 CPUs + 49 GB RAM GPU Simulations Needed to Improve Ice Model. => Results in Significant Improvement in Pointing Resolution for Multi-Messenger Astrophysics IceCube
  • 18.
    UCSD’s ITS AdaptedPRP FIONA8s To Support Data Science Courses Instructional Data Science Machine Learning Platform: Instead of Spending ~$20,000/Quarter/Course on Commercial Clouds: 97 Courses over 6 Quarters  $4M vs. $240K over 12 Quarters At least 20,000 Students Adam Tilghman, ITS Source: UCSD ITS
  • 19.
    The Student GPUs HaveSupported a Broad Set of Courses Across Campus Source: UCSD ITS
  • 20.
    The ITS GPUs HaveSupported Thousands of Students Source: UCSD ITS
  • 21.
    Student GPU DemandIs Variable Allowing for Other Student Uses Available to Support: Independent Study, For-Credit Research, External Barter Source: UCSD ITS
  • 22.
    PRP Actively DevelopsDiversity • Grants – 3 Female co-PIs – 1 Hispanic co-PI • Campuses – 8 Minority-Serving Institutions in PRP/CHASE-CI • Workshops – NRPII Workshop Steering Committee 80% Female – Multiple MSI, EPSCoR Focused Workshops Jackson State University PRP MSI Workshop Presenting FIONettes