Kostas Tzoumas, Stephan
Ewen
Flink committers
co-founders, data Artisans
@kostas_tzoumas
@StephanEwen
Apache
Flink
What is Flink
 Collection programming APIs for batch
and real-time streaming analysis
 Backed by a very robust execution
backend
• with true streaming capabilities,
• custom memory manager,
• native iteration execution,
• and a cost-based optimizer.
2
The case for Flink
 Performance and ease of use
• Exploits in-memory and pipelining, language-
embedded logical APIs
 Unified batch and real streaming
• Batch and Stream APIs on top of streaming engine
 A runtime that "just works" without tuning
• C++ style memory management inside the JVM
 Predictable and dependable execution
• Bird’s-eye view of what runs and how, and what failed
and why
3
Example: WordCount
4
case class Word (word: String, frequency: Int)
val env = ExecutionEnvironment.getExecutionEnvironment
env.readTextFile(...)
.flatMap {line => line.split(" ").map(word => Word(word,1))}
.groupBy("word").sum("frequency”).print()
env.execute()
Flink has mirrored Java and Scala APIs that offer the same
functionality, including by-name addressing.
Example: Window WordCount
5
case class Word (word: String, frequency: Int)
val env =
StreamExecutionEnvironment.getExecutionEnvironment
val lines = env.fromSocketStream(...)
lines
.flatMap {line => line.split(" ").map(word => Word(word,1))}
.window(Count.of(100)).every(Count.of(10))
.groupBy("word").sum("frequency”).print()
env.execute()
Defining windows
 Trigger policy
• When to trigger the computation on current window
 Eviction policy
• When data points should leave the window
• Defines window width/size
 E.g., count-based policy
• evict when #elements > n
• start a new window every n-th element
 Built-in: Count, Time, Delta policies
6
Flink API in a nutshell
 map, flatMap, filter,
groupBy, reduce,
reduceGroup,
aggregate, join,
coGroup, cross, project,
distinct, union, iterate,
iterateDelta, ...
 All Hadoop input
formats are supported
 API similar for data sets
and data streams with
slightly different
operator semantics
 Window functions for
data streams
 Counters,
accumulators, and
broadcast variables
7
Flink stack
8
Flink Optimizer Flink Stream Builder
Common API
Scala API Java API
Python API
(upcoming)
Graph API
(Gelly)
Apache
MRQL
Flink Local RuntimeEmbedded
environment
(Java collections)
Local
Environment
(for debugging)
Remote environment
(Regular cluster execution)
Apache Tez
Data
storage
HDFSFiles S3 JDBC Flume
Rabbit
MQ
KafkaHBase …
Single node execution Standalone or YARN cluster
Technology inside Flink
 Technology inspired by compilers +
MPP databases + distributed systems
 For ease of use, reliable performance,
and scalability
case class Path (from: Long, to:
Long)
val tc = edges.iterate(10) {
paths: DataSet[Path] =>
val next = paths
.join(edges)
.where("to")
.equalTo("from") {
(path, edge) =>
Path(path.from, edge.to)
}
.union(paths)
.distinct()
next
}
Cost-based
optimizer
Type extraction
stack
Memory
manager
Out-of-core
algos
real-time
streaming
Task
scheduling
Recovery
metadata
Data
serialization
stack
Streaming
network
stack
...
Pre-flight
(client) Master
Workers
Notable runtime features
1. Pipelined data transfers
2. Management of memory
3. Native iterations
4. Program optimization
10
Pipelined data transfers
11
Staged (batch) execution
Romeo,
Romeo,
where art
thou Romeo?
Load Log
Search
for str1
Search
for str2
Search
for str3
Grep 1
Grep 2
Grep 3
Stage 1:
Create/cache Log
Subseqent stages:
Grep log for matches
Caching in-memory
and disk if needed
12
Pipelined execution
Romeo,
Romeo,
where art
thou Romeo?
Load Log
Search
for str1
Search
for str2
Search
for str3
Grep 1
Grep 2
Grep 3
001100110011001100110011
Stage 1:
Deploy and start operators
Data transfer in-
memory and disk if
needed 13
Note: Log
DataSet is
never
“created”!
Pipelining in Flink
 Currently the default mode of operation
• Much better performance in many cases – no
need to materialize large data sets
• Supports both batch and real-time streaming
 In the future pluggable
• Batch will use combination of blocking and
pipelining
• Streaming will use pipelining
• Interactive will use blocking
14
Memory management
15
Memory management in Flink
public class WC {
public String word;
public int count;
}
empty
page
Pool of Memory Pages
Sorting,
hashing,
caching
Shuffling,
broadcasts
User code
objects
ManagedUnmanaged
16
Flink contains its own memory management stack. Memory is
allocated, de-allocated, and used strictly using an internal buffer pool
implementation. To do that, Flink contains its own type extraction and
serialization components.
Configuring Flink
 Per job
• Parallelism
 System config
• Total JVM heap size (-Xmx)
• % of total JVM size for Flink runtime
• Memory for network buffers (soon not needed)
 That's all you need. System will not throw an
OOM exception to you.
17
Benefits of managed memory
 More reliable and stable performance (less GC
effects, easy to go to disk)
18
Native iterative processing
19
Example: Transitive Closure
20
case class Path (from: Long, to: Long)
val env =
ExecutionEnvironment.getExecutionEnvironment
val edges = ...
val tc = edges.iterate (10) { paths: DataSet[Path] =>
val next = paths
.join(edges).where("to").equalTo("from") {
(path, edge) => Path(path.from, edge.to)
}
.union(paths).distinct()
next
}
tc.print()
env.execute()
Iterate natively
21
partial
solution
partial
solutionX
other
datasets
Y
initial
solution
iteration
result
Replace
Step function
Iterate natively with deltas
22
partial
solution
delta
setX
other
datasets
Y
initial
solution
iteration
result
workset A B workset
Merge deltas
Replace
initial
workset
Effect of delta iterations
0
5000000
10000000
15000000
20000000
25000000
30000000
35000000
40000000
45000000
1 6 11 16 21 26 31 36 41 46 51 56 61
#ofelementsupdated
iteration
Iteration performance
24
MapReduce
Closing
25
Flink roadmap for 2015
 Unify batch and streaming
 Machine learning library and Mahout
 Graph processing library improvements
 Interactive programs and Zeppelin
 Logical queries and SQL
 And many more
26
Flink community
0
20
40
60
80
100
120
Aug-10 Feb-11 Sep-11 Apr-12 Oct-12 May-13 Nov-13 Jun-14 Dec-14 Jul-15
#unique contributors by git commits
(without manual de-dup)
flink.apache.org
@ApacheFlink

January 2015 HUG: Apache Flink: Fast and reliable large-scale data processing

  • 1.
    Kostas Tzoumas, Stephan Ewen Flinkcommitters co-founders, data Artisans @kostas_tzoumas @StephanEwen Apache Flink
  • 2.
    What is Flink Collection programming APIs for batch and real-time streaming analysis  Backed by a very robust execution backend • with true streaming capabilities, • custom memory manager, • native iteration execution, • and a cost-based optimizer. 2
  • 3.
    The case forFlink  Performance and ease of use • Exploits in-memory and pipelining, language- embedded logical APIs  Unified batch and real streaming • Batch and Stream APIs on top of streaming engine  A runtime that "just works" without tuning • C++ style memory management inside the JVM  Predictable and dependable execution • Bird’s-eye view of what runs and how, and what failed and why 3
  • 4.
    Example: WordCount 4 case classWord (word: String, frequency: Int) val env = ExecutionEnvironment.getExecutionEnvironment env.readTextFile(...) .flatMap {line => line.split(" ").map(word => Word(word,1))} .groupBy("word").sum("frequency”).print() env.execute() Flink has mirrored Java and Scala APIs that offer the same functionality, including by-name addressing.
  • 5.
    Example: Window WordCount 5 caseclass Word (word: String, frequency: Int) val env = StreamExecutionEnvironment.getExecutionEnvironment val lines = env.fromSocketStream(...) lines .flatMap {line => line.split(" ").map(word => Word(word,1))} .window(Count.of(100)).every(Count.of(10)) .groupBy("word").sum("frequency”).print() env.execute()
  • 6.
    Defining windows  Triggerpolicy • When to trigger the computation on current window  Eviction policy • When data points should leave the window • Defines window width/size  E.g., count-based policy • evict when #elements > n • start a new window every n-th element  Built-in: Count, Time, Delta policies 6
  • 7.
    Flink API ina nutshell  map, flatMap, filter, groupBy, reduce, reduceGroup, aggregate, join, coGroup, cross, project, distinct, union, iterate, iterateDelta, ...  All Hadoop input formats are supported  API similar for data sets and data streams with slightly different operator semantics  Window functions for data streams  Counters, accumulators, and broadcast variables 7
  • 8.
    Flink stack 8 Flink OptimizerFlink Stream Builder Common API Scala API Java API Python API (upcoming) Graph API (Gelly) Apache MRQL Flink Local RuntimeEmbedded environment (Java collections) Local Environment (for debugging) Remote environment (Regular cluster execution) Apache Tez Data storage HDFSFiles S3 JDBC Flume Rabbit MQ KafkaHBase … Single node execution Standalone or YARN cluster
  • 9.
    Technology inside Flink Technology inspired by compilers + MPP databases + distributed systems  For ease of use, reliable performance, and scalability case class Path (from: Long, to: Long) val tc = edges.iterate(10) { paths: DataSet[Path] => val next = paths .join(edges) .where("to") .equalTo("from") { (path, edge) => Path(path.from, edge.to) } .union(paths) .distinct() next } Cost-based optimizer Type extraction stack Memory manager Out-of-core algos real-time streaming Task scheduling Recovery metadata Data serialization stack Streaming network stack ... Pre-flight (client) Master Workers
  • 10.
    Notable runtime features 1.Pipelined data transfers 2. Management of memory 3. Native iterations 4. Program optimization 10
  • 11.
  • 12.
    Staged (batch) execution Romeo, Romeo, whereart thou Romeo? Load Log Search for str1 Search for str2 Search for str3 Grep 1 Grep 2 Grep 3 Stage 1: Create/cache Log Subseqent stages: Grep log for matches Caching in-memory and disk if needed 12
  • 13.
    Pipelined execution Romeo, Romeo, where art thouRomeo? Load Log Search for str1 Search for str2 Search for str3 Grep 1 Grep 2 Grep 3 001100110011001100110011 Stage 1: Deploy and start operators Data transfer in- memory and disk if needed 13 Note: Log DataSet is never “created”!
  • 14.
    Pipelining in Flink Currently the default mode of operation • Much better performance in many cases – no need to materialize large data sets • Supports both batch and real-time streaming  In the future pluggable • Batch will use combination of blocking and pipelining • Streaming will use pipelining • Interactive will use blocking 14
  • 15.
  • 16.
    Memory management inFlink public class WC { public String word; public int count; } empty page Pool of Memory Pages Sorting, hashing, caching Shuffling, broadcasts User code objects ManagedUnmanaged 16 Flink contains its own memory management stack. Memory is allocated, de-allocated, and used strictly using an internal buffer pool implementation. To do that, Flink contains its own type extraction and serialization components.
  • 17.
    Configuring Flink  Perjob • Parallelism  System config • Total JVM heap size (-Xmx) • % of total JVM size for Flink runtime • Memory for network buffers (soon not needed)  That's all you need. System will not throw an OOM exception to you. 17
  • 18.
    Benefits of managedmemory  More reliable and stable performance (less GC effects, easy to go to disk) 18
  • 19.
  • 20.
    Example: Transitive Closure 20 caseclass Path (from: Long, to: Long) val env = ExecutionEnvironment.getExecutionEnvironment val edges = ... val tc = edges.iterate (10) { paths: DataSet[Path] => val next = paths .join(edges).where("to").equalTo("from") { (path, edge) => Path(path.from, edge.to) } .union(paths).distinct() next } tc.print() env.execute()
  • 21.
  • 22.
    Iterate natively withdeltas 22 partial solution delta setX other datasets Y initial solution iteration result workset A B workset Merge deltas Replace initial workset
  • 23.
    Effect of deltaiterations 0 5000000 10000000 15000000 20000000 25000000 30000000 35000000 40000000 45000000 1 6 11 16 21 26 31 36 41 46 51 56 61 #ofelementsupdated iteration
  • 24.
  • 25.
  • 26.
    Flink roadmap for2015  Unify batch and streaming  Machine learning library and Mahout  Graph processing library improvements  Interactive programs and Zeppelin  Logical queries and SQL  And many more 26
  • 27.
    Flink community 0 20 40 60 80 100 120 Aug-10 Feb-11Sep-11 Apr-12 Oct-12 May-13 Nov-13 Jun-14 Dec-14 Jul-15 #unique contributors by git commits (without manual de-dup)
  • 28.

Editor's Notes

  • #28 dev list: 300-400 messages/month. record 1000 messages on