Numba
NumPy-aware dynamic Python compiler

               Travis E. Oliphant


   SciPy 2012. Austin, TX, USA. July 18, 2012
Motivation
• Python is great for rapid development
  and high-level thinking-in-code
• It is slow for interior loops because lack
  of type information leads to a lot of
  indirection and “extra” code.
Motivation
• NumPy users have a lot of type
  information --- but only currently have
  one-size fits all pre-compiled, vectorized
  loops.
• Many new features envisioned will need
  the ability for high-level expressions to
  be compiled to machine code.
Goals
 • Most developers should not have to write
   anything but Python -- or other even higher-
   level Domain Specific Language (DSL).
 • Create faster code using array-expressions from
   NumPy users -- Fortran is the initial target
 • Take advantage of multi-core and GPUs for a
   subset of Python.
Why Not PyPy?
• PyPy does not work with CPython
• PyPy is a (meta) “tracing” JIT. Machine code is
  generated on the fly so there is no “build step” -- but
  we want to support a “build step” when justified
• PyPy tries to speed up everything -- we want to
  optimize more specifically on numeric codes
  (including complex numbers)
               More to the story...
Why not Cython?

• Cython is great for what it does, but...
• Cython creates extension modules which cannot be
  “unloaded” dynamically
• Cython requires a full C-compiler
• Cython doesn’t do type inference -- you have to
  declare types on everything
• Cython is another syntax to learn
What’s the real motivation...
• “Computed columns” for data-types
• Always been bothered by how to write a fast-version
  of “vectorize”
• and... I wanted to play with LLVM!
More Ranting
• The world needs more array-oriented compilers --
  Python has needed one for a decade at least.
• Array-oriented computing needs more light in CS
  curricula
• Most domain experts can write what they want at a
  high-level. Commonly this is then “translated” to a
  lower-level and then the compiler gets a hold of it.
  This is sub-optimal.
• Projects discussed are doing this, but still niche.
  Copperhead, Theano, etc.
More Ranting
• Today’s vector machines (and vector co-processors,
  or GPUS) were made for array-oriented computing.
• The software stack has just not caught up ---
  unfortunate because APL came out in 1963.
• There is a reason Fortran remains popular.
Array-Oriented Computing
• Loosely defined as “Organize data-together” and
 operate on it together (or in cache-size chunks) with
 array-level operations (e.g. NumPy)
              Object                           Attr1   Attr2   Attr3
              Attr1    Object
    Object                           Object1
              Attr2    Attr1
    Attr1
              Attr3    Attr2         Object2
    Attr2
                       Attr3
    Attr3                            Object3

              Object                 Object4
     Object   Attr1
                       Object        Object5
      Attr1   Attr2
                        Attr1
      Attr2   Attr3                  Object6
                        Attr2
      Attr3             Attr3
Goal:

        Numba should be the world’s best
           array-oriented compiler.
NumPy + Mamba = Numba
 Python Function                         Machine Code


                       LLVM-PY

                   LLVM Library
       ISPC   OpenCL    OpenMP    CUDA      CLANG

    Intel     AMD        Nvidia     Apple       ARM
Ufuncs


                Generalized
                 UFuncs
                                                          Python
                                                         Function
                 Window
                 Kernel
                  Funcs

                 Function-
                                                                    Uses of Numba




                   based
                 Indexing


                 Memory
                  Filters
                                                 Numba




NumPy Runtime
                I/O Filters



                Reduction
                 Filters


                Computed
                Columns
                              function pointer
Uses of Numba in SciPy

     optimize                   integrate


     special                       ode



     writing more of SciPy at high-level
Numba --- a deeper look

   Numba is a Python to LLVM translator. It
   translates Python to LLVM IR (the LLVM
   machinery is then used to create machine
  code from there). Numba is NumPy aware
    --- it understands NumPy’s type system,
      methods, C-API, and data-structures
Numba -- written in Python
 • Numba itself is pure Python -- it uses (an
   updated) LLVM-py to interact with the LLVM
   C++ library to build a representation of the
   code in LLVM assembler.
 • LLVM then creates machine code (or a
   “bitcode” module which can be persisted or
   sent to another machine)
 • Machine-code is equivalent to a C-level
   function-pointer (e.g. a ctypes function)
Example
Examples
Demo
Status and Future
• Current master branch mostly due to Jon Riehl
  (Resilient Science) sponsored by Continuum
  Analytics, Inc. --- interprets bytecode directly
• New devel branch working with AST directly and
  making rapid progress
  - Mark Florrison (minivect)
  - Siu Kwan Lam (pymothoa)
Software Stack Future?
         Plateaus of Code re-use + DSLs
   SQL                                R
            TDPL                                Matlab


                    Python


             OBJC                C
  FORTRAN                                 C++



                     LLVM
Join Us!



      http://numba.github.com/numba

Numba

  • 1.
    Numba NumPy-aware dynamic Pythoncompiler Travis E. Oliphant SciPy 2012. Austin, TX, USA. July 18, 2012
  • 2.
    Motivation • Python isgreat for rapid development and high-level thinking-in-code • It is slow for interior loops because lack of type information leads to a lot of indirection and “extra” code.
  • 3.
    Motivation • NumPy usershave a lot of type information --- but only currently have one-size fits all pre-compiled, vectorized loops. • Many new features envisioned will need the ability for high-level expressions to be compiled to machine code.
  • 4.
    Goals • Mostdevelopers should not have to write anything but Python -- or other even higher- level Domain Specific Language (DSL). • Create faster code using array-expressions from NumPy users -- Fortran is the initial target • Take advantage of multi-core and GPUs for a subset of Python.
  • 5.
    Why Not PyPy? •PyPy does not work with CPython • PyPy is a (meta) “tracing” JIT. Machine code is generated on the fly so there is no “build step” -- but we want to support a “build step” when justified • PyPy tries to speed up everything -- we want to optimize more specifically on numeric codes (including complex numbers) More to the story...
  • 6.
    Why not Cython? •Cython is great for what it does, but... • Cython creates extension modules which cannot be “unloaded” dynamically • Cython requires a full C-compiler • Cython doesn’t do type inference -- you have to declare types on everything • Cython is another syntax to learn
  • 7.
    What’s the realmotivation... • “Computed columns” for data-types • Always been bothered by how to write a fast-version of “vectorize” • and... I wanted to play with LLVM!
  • 8.
    More Ranting • Theworld needs more array-oriented compilers -- Python has needed one for a decade at least. • Array-oriented computing needs more light in CS curricula • Most domain experts can write what they want at a high-level. Commonly this is then “translated” to a lower-level and then the compiler gets a hold of it. This is sub-optimal. • Projects discussed are doing this, but still niche. Copperhead, Theano, etc.
  • 9.
    More Ranting • Today’svector machines (and vector co-processors, or GPUS) were made for array-oriented computing. • The software stack has just not caught up --- unfortunate because APL came out in 1963. • There is a reason Fortran remains popular.
  • 10.
    Array-Oriented Computing • Looselydefined as “Organize data-together” and operate on it together (or in cache-size chunks) with array-level operations (e.g. NumPy) Object Attr1 Attr2 Attr3 Attr1 Object Object Object1 Attr2 Attr1 Attr1 Attr3 Attr2 Object2 Attr2 Attr3 Attr3 Object3 Object Object4 Object Attr1 Object Object5 Attr1 Attr2 Attr1 Attr2 Attr3 Object6 Attr2 Attr3 Attr3
  • 11.
    Goal: Numba should be the world’s best array-oriented compiler.
  • 12.
    NumPy + Mamba= Numba Python Function Machine Code LLVM-PY LLVM Library ISPC OpenCL OpenMP CUDA CLANG Intel AMD Nvidia Apple ARM
  • 13.
    Ufuncs Generalized UFuncs Python Function Window Kernel Funcs Function- Uses of Numba based Indexing Memory Filters Numba NumPy Runtime I/O Filters Reduction Filters Computed Columns function pointer
  • 14.
    Uses of Numbain SciPy optimize integrate special ode writing more of SciPy at high-level
  • 15.
    Numba --- adeeper look Numba is a Python to LLVM translator. It translates Python to LLVM IR (the LLVM machinery is then used to create machine code from there). Numba is NumPy aware --- it understands NumPy’s type system, methods, C-API, and data-structures
  • 16.
    Numba -- writtenin Python • Numba itself is pure Python -- it uses (an updated) LLVM-py to interact with the LLVM C++ library to build a representation of the code in LLVM assembler. • LLVM then creates machine code (or a “bitcode” module which can be persisted or sent to another machine) • Machine-code is equivalent to a C-level function-pointer (e.g. a ctypes function)
  • 17.
  • 18.
  • 21.
  • 22.
    Status and Future •Current master branch mostly due to Jon Riehl (Resilient Science) sponsored by Continuum Analytics, Inc. --- interprets bytecode directly • New devel branch working with AST directly and making rapid progress - Mark Florrison (minivect) - Siu Kwan Lam (pymothoa)
  • 23.
    Software Stack Future? Plateaus of Code re-use + DSLs SQL R TDPL Matlab Python OBJC C FORTRAN C++ LLVM
  • 24.
    Join Us! http://numba.github.com/numba