GPU OVERVIEW IN
FINANCIAL SERVICES

 ALASTAIR HOUSTON
      COMPUTE
 FSI SALES MANAGER
Agenda

           Nvidia and HPC markets

           GPU Overview

           CUDA and OpenCL

           Current FS deployments




© NVIDIA Corporation 2009
CUDA Runs on NVIDIA GPUs …
Over 80 Million CUDA GPUs Deployed

                    GeForce®              TeslaTM                Quadro®
                 Entertainment   High-Performance Computing   Design & Creation




© NVIDIA Corporation 2009
146X                     36X                   18X                    50X                 100X
 Medical Imaging             Molecular Dynamics       Video Transcoding    Matlab Computing        Astrophysics
   U of Utah                 U of Illinois, Urbana      Elemental Tech       AccelerEyes              RIKEN



                                                     50x – 150x


             149X                     47X                  20X                   130X                  30X
Financial simulation            Linear Algebra         3D Ultrasound      Quantum Chemistry       Gene Sequencing
      Oxford                   Universidad Jaime        Techniscan        U of Illinois, Urbana    U of Maryland
 © NVIDIA Corporation 2009
Options Pricing, Risk Modeling, Algorithmic Trading

          Options pricing use Monte Carlo
          (MC) simulations

          Random Number Generators (RNG)
          are key to MC

          Up to 100x speed-up in RNGs
          using CUDA

          25-60x overall speedup in Monte
          Carlo simulations


© NVIDIA Corporation 2009
Co-Processing




                                 CPU                     GPU

                            The Right Processor for the Right Tasks
© NVIDIA Corporation 2009
The Performance Gap Widens Further




                                           8x double precision
                                                  ECC
                                             L1, L2 Caches
                            1 TF Single Precision
                                4GB Memory




                                                         NVIDIA GPU
© NVIDIA Corporation 2009
                                                         X86 CPU
Introducing the ‘Fermi’ Architecture
  The Soul of a Supercomputer in the body of a GPU
                                            3 billion transistors




                                 DRAM I/F
   DRAM I/F




                                 DRAM I/F
                                            Over 2× the cores (512 total)
                                            8× the peak DP performance




                                 DRAM I/F
                                 DRAM I/F
   HOST I/F




                                            ECC

                            L2
                                            L1 and L2 caches
   Giga Thread




                                 DRAM I/F
                                 DRAM I/F
                                            ~2× memory bandwidth (GDDR5)
                                            Up to 1 Terabyte of GPU memory
                                 DRAM I/F
                                 DRAM I/F
   DRAM I/F




                                            Concurrent kernels
                                            Hardware support for C++

© NVIDIA Corporation 2009
NVIDIA Compute Products

                            Board Level Products   1U Server Product




                 1 Tesla GPU                       4 Tesla GPUs
                 Workstation Product               Data Center Product
                 OEM Product



© NVIDIA Corporation 2009
CUDA C and OpenCL
Momentum
  Over 100,000,000
  installed CUDA-
  Architecture GPUs                                            GPU Computing Applications
  Over 60,000 GPU
  Computing Developers
  (1/09)

  Windows, Linux and
  MacOS Platforms                   C                   OpenCL                DirectX               FORTRAN                Python,
  supported                                                                  Compute                                       Java, …
                            With CUDA Extensions
                            Over 60,000 developers   1st GPU demo           Microsoft’s GPU
                                                                            Microsoft’            SW supplied by:        Compute Kernels
  GPU Computing spans                                Shipped 1st OpenCL     Computing API         • The Portland Group   Driver API Bindings
  Consumer applications     Running in Production
                                                     Driver                 Supports all CUDA-
                                                                                         CUDA-    • NCSA release
                            since 2008
  to HPC                                             Strategic developers   Architecture GPUs
                            SDK + Lib’s + Visual
                                    Lib’                                    since G80 (DX10 and
                                                     using NV SW today
                            Profiler and Debugger                           future DX11 GPUs)
  200+ Universities
  teaching the CUDA
  Architecture and GPU
  Computing
                                                        NVIDIA GPU
                                                        with the CUDA Parallel Computing Architecture

© NVIDIA Corporation 2009
NVIDIA Nexus
           Nexus is a GPU application development suite that integrates
           directly into Visual Studio.
                            A C/CUDA source debugger for both the CUDA runtime and driver API
                            New C/CUDA performance analysis/trace tools




© NVIDIA Corporation 2009
FSI CUSTOMER DEPLOYMENTS




© NVIDIA Corporation 2009
Case Study: Equity Derivatives




                                  15                15x Faster           1

                            2 Tesla S1070        16x Less Space    500 CPU Cores

                                $24 K            10x Lower Cost       $250 K

                              2.8 KWatts         13x Lower Power    37.5 KWatts

            Source: BNP Paribas, March 4, 2009
© NVIDIA Corporation 2009
Case Study: Security Pricing




                               2 hours                               8x Faster         16 hours

                            48 Tesla S1070                         10x Less Space   8000 CPU Cores


            Source: Wall Street & Technology, September 24, 2009
© NVIDIA Corporation 2009

N A G P A R I S280101

  • 1.
    GPU OVERVIEW IN FINANCIALSERVICES ALASTAIR HOUSTON COMPUTE FSI SALES MANAGER
  • 2.
    Agenda Nvidia and HPC markets GPU Overview CUDA and OpenCL Current FS deployments © NVIDIA Corporation 2009
  • 3.
    CUDA Runs onNVIDIA GPUs … Over 80 Million CUDA GPUs Deployed GeForce® TeslaTM Quadro® Entertainment High-Performance Computing Design & Creation © NVIDIA Corporation 2009
  • 4.
    146X 36X 18X 50X 100X Medical Imaging Molecular Dynamics Video Transcoding Matlab Computing Astrophysics U of Utah U of Illinois, Urbana Elemental Tech AccelerEyes RIKEN 50x – 150x 149X 47X 20X 130X 30X Financial simulation Linear Algebra 3D Ultrasound Quantum Chemistry Gene Sequencing Oxford Universidad Jaime Techniscan U of Illinois, Urbana U of Maryland © NVIDIA Corporation 2009
  • 5.
    Options Pricing, RiskModeling, Algorithmic Trading Options pricing use Monte Carlo (MC) simulations Random Number Generators (RNG) are key to MC Up to 100x speed-up in RNGs using CUDA 25-60x overall speedup in Monte Carlo simulations © NVIDIA Corporation 2009
  • 6.
    Co-Processing CPU GPU The Right Processor for the Right Tasks © NVIDIA Corporation 2009
  • 7.
    The Performance GapWidens Further 8x double precision ECC L1, L2 Caches 1 TF Single Precision 4GB Memory NVIDIA GPU © NVIDIA Corporation 2009 X86 CPU
  • 8.
    Introducing the ‘Fermi’Architecture The Soul of a Supercomputer in the body of a GPU 3 billion transistors DRAM I/F DRAM I/F DRAM I/F Over 2× the cores (512 total) 8× the peak DP performance DRAM I/F DRAM I/F HOST I/F ECC L2 L1 and L2 caches Giga Thread DRAM I/F DRAM I/F ~2× memory bandwidth (GDDR5) Up to 1 Terabyte of GPU memory DRAM I/F DRAM I/F DRAM I/F Concurrent kernels Hardware support for C++ © NVIDIA Corporation 2009
  • 9.
    NVIDIA Compute Products Board Level Products 1U Server Product 1 Tesla GPU 4 Tesla GPUs Workstation Product Data Center Product OEM Product © NVIDIA Corporation 2009
  • 10.
    CUDA C andOpenCL Momentum Over 100,000,000 installed CUDA- Architecture GPUs GPU Computing Applications Over 60,000 GPU Computing Developers (1/09) Windows, Linux and MacOS Platforms C OpenCL DirectX FORTRAN Python, supported Compute Java, … With CUDA Extensions Over 60,000 developers 1st GPU demo Microsoft’s GPU Microsoft’ SW supplied by: Compute Kernels GPU Computing spans Shipped 1st OpenCL Computing API • The Portland Group Driver API Bindings Consumer applications Running in Production Driver Supports all CUDA- CUDA- • NCSA release since 2008 to HPC Strategic developers Architecture GPUs SDK + Lib’s + Visual Lib’ since G80 (DX10 and using NV SW today Profiler and Debugger future DX11 GPUs) 200+ Universities teaching the CUDA Architecture and GPU Computing NVIDIA GPU with the CUDA Parallel Computing Architecture © NVIDIA Corporation 2009
  • 11.
    NVIDIA Nexus Nexus is a GPU application development suite that integrates directly into Visual Studio. A C/CUDA source debugger for both the CUDA runtime and driver API New C/CUDA performance analysis/trace tools © NVIDIA Corporation 2009
  • 12.
    FSI CUSTOMER DEPLOYMENTS ©NVIDIA Corporation 2009
  • 13.
    Case Study: EquityDerivatives 15 15x Faster 1 2 Tesla S1070 16x Less Space 500 CPU Cores $24 K 10x Lower Cost $250 K 2.8 KWatts 13x Lower Power 37.5 KWatts Source: BNP Paribas, March 4, 2009 © NVIDIA Corporation 2009
  • 14.
    Case Study: SecurityPricing 2 hours 8x Faster 16 hours 48 Tesla S1070 10x Less Space 8000 CPU Cores Source: Wall Street & Technology, September 24, 2009 © NVIDIA Corporation 2009