1
PyCon.DE / PyData Karlsruhe 2018
Uwe L. Korn
Scalable Scientific Computing with
Dask
2
• Senior Data Scientist at Blue Yonder
(@BlueYonderTech)
• Apache {Arrow, Parquet} PMC
• Data Engineer and Architect with heavy
focus around Pandas
About me
xhochy
mail@uwekorn.com
3
• Execution and definition of task graphs
• a parallel computing library that scales the existing Python ecosystem.
• scales down to your laptop laptop
• sclaes up to a cluster
What is Dask?
4
• multi-core and distributed parallel execution
• low-level: task schedulers for computation graphs
• high-level: Array, Bag and DataFrame
More than a single CPU
5
Dask is
• More light-weight
• In Python, operates well with C/C++/Fortran/LLVM or other natively
compiled code
• Part of the Python ecosystem
What about Spark?
6
Spark is
• Written in Scala and works well within the JVM
• Python support is very limited
• Brings its own ecosystem
• Able to provide more higher level optimizations
What about Spark?
https://github.com/mrocklin/
pydata-nyc-2018-tutorial
7

Scalable Scientific Computing with Dask

  • 1.
    1 PyCon.DE / PyDataKarlsruhe 2018 Uwe L. Korn Scalable Scientific Computing with Dask
  • 2.
    2 • Senior DataScientist at Blue Yonder (@BlueYonderTech) • Apache {Arrow, Parquet} PMC • Data Engineer and Architect with heavy focus around Pandas About me xhochy mail@uwekorn.com
  • 3.
    3 • Execution anddefinition of task graphs • a parallel computing library that scales the existing Python ecosystem. • scales down to your laptop laptop • sclaes up to a cluster What is Dask?
  • 4.
    4 • multi-core anddistributed parallel execution • low-level: task schedulers for computation graphs • high-level: Array, Bag and DataFrame More than a single CPU
  • 5.
    5 Dask is • Morelight-weight • In Python, operates well with C/C++/Fortran/LLVM or other natively compiled code • Part of the Python ecosystem What about Spark?
  • 6.
    6 Spark is • Writtenin Scala and works well within the JVM • Python support is very limited • Brings its own ecosystem • Able to provide more higher level optimizations What about Spark?
  • 7.