January 2015 HUG: Apache Flink: Fast and reliable large-scale data processing

Kostas Tzoumas, Stephan
Ewen
Flink committers
co-founders, data Artisans
@kostas_tzoumas
@StephanEwen
Apache
Flink

What is Flink
 Collection programming APIs for batch
and real-time streaming analysis
 Backed by a very robust execution
backend
• with true streaming capabilities,
• custom memory manager,
• native iteration execution,
• and a cost-based optimizer.
2

The case for Flink
 Performance and ease of use
• Exploits in-memory and pipelining, language-
embedded logical APIs
 Unified batch and real streaming
• Batch and Stream APIs on top of streaming engine
 A runtime that "just works" without tuning
• C++ style memory management inside the JVM
 Predictable and dependable execution
• Bird’s-eye view of what runs and how, and what failed
and why
3

Example: WordCount
4
case class Word (word: String, frequency: Int)
val env = ExecutionEnvironment.getExecutionEnvironment
env.readTextFile(...)
.flatMap {line => line.split(" ").map(word => Word(word,1))}
.groupBy("word").sum("frequency”).print()
env.execute()
Flink has mirrored Java and Scala APIs that offer the same
functionality, including by-name addressing.

Example: Window WordCount
5
case class Word (word: String, frequency: Int)
val env =
StreamExecutionEnvironment.getExecutionEnvironment
val lines = env.fromSocketStream(...)
lines
.flatMap {line => line.split(" ").map(word => Word(word,1))}
.window(Count.of(100)).every(Count.of(10))
.groupBy("word").sum("frequency”).print()
env.execute()

Defining windows
 Trigger policy
• When to trigger the computation on current window
 Eviction policy
• When data points should leave the window
• Defines window width/size
 E.g., count-based policy
• evict when #elements > n
• start a new window every n-th element
 Built-in: Count, Time, Delta policies
6

Flink API in a nutshell
 map, flatMap, filter,
groupBy, reduce,
reduceGroup,
aggregate, join,
coGroup, cross, project,
distinct, union, iterate,
iterateDelta, ...
 All Hadoop input
formats are supported
 API similar for data sets
and data streams with
slightly different
operator semantics
 Window functions for
data streams
 Counters,
accumulators, and
broadcast variables
7

Flink stack
8
Flink Optimizer Flink Stream Builder
Common API
Scala API Java API
Python API
(upcoming)
Graph API
(Gelly)
Apache
MRQL
Flink Local RuntimeEmbedded
environment
(Java collections)
Local
Environment
(for debugging)
Remote environment
(Regular cluster execution)
Apache Tez
Data
storage
HDFSFiles S3 JDBC Flume
Rabbit
MQ
KafkaHBase …
Single node execution Standalone or YARN cluster

Technology inside Flink
 Technology inspired by compilers +
MPP databases + distributed systems
 For ease of use, reliable performance,
and scalability
case class Path (from: Long, to:
Long)
val tc = edges.iterate(10) {
paths: DataSet[Path] =>
val next = paths
.join(edges)
.where("to")
.equalTo("from") {
(path, edge) =>
Path(path.from, edge.to)
}
.union(paths)
.distinct()
next
}
Cost-based
optimizer
Type extraction
stack
Memory
manager
Out-of-core
algos
real-time
streaming
Task
scheduling
Recovery
metadata
Data
serialization
stack
Streaming
network
stack
...
Pre-flight
(client) Master
Workers

Notable runtime features
1. Pipelined data transfers
2. Management of memory
3. Native iterations
4. Program optimization
10

Staged (batch) execution
Romeo,
Romeo,
where art
thou Romeo?
Load Log
Search
for str1
Search
for str2
Search
for str3
Grep 1
Grep 2
Grep 3
Stage 1:
Create/cache Log
Subseqent stages:
Grep log for matches
Caching in-memory
and disk if needed
12

Pipelined execution
Romeo,
Romeo,
where art
thou Romeo?
Load Log
Search
for str1
Search
for str2
Search
for str3
Grep 1
Grep 2
Grep 3
001100110011001100110011
Stage 1:
Deploy and start operators
Data transfer in-
memory and disk if
needed 13
Note: Log
DataSet is
never
“created”!

Pipelining in Flink
 Currently the default mode of operation
• Much better performance in many cases – no
need to materialize large data sets
• Supports both batch and real-time streaming
 In the future pluggable
• Batch will use combination of blocking and
pipelining
• Streaming will use pipelining
• Interactive will use blocking
14

Memory management in Flink
public class WC {
public String word;
public int count;
}
empty
page
Pool of Memory Pages
Sorting,
hashing,
caching
Shuffling,
broadcasts
User code
objects
ManagedUnmanaged
16
Flink contains its own memory management stack. Memory is
allocated, de-allocated, and used strictly using an internal buffer pool
implementation. To do that, Flink contains its own type extraction and
serialization components.

Configuring Flink
 Per job
• Parallelism
 System config
• Total JVM heap size (-Xmx)
• % of total JVM size for Flink runtime
• Memory for network buffers (soon not needed)
 That's all you need. System will not throw an
OOM exception to you.
17

Benefits of managed memory
 More reliable and stable performance (less GC
effects, easy to go to disk)
18

Native iterative processing
19

Example: Transitive Closure
20
case class Path (from: Long, to: Long)
val env =
ExecutionEnvironment.getExecutionEnvironment
val edges = ...
val tc = edges.iterate (10) { paths: DataSet[Path] =>
val next = paths
.join(edges).where("to").equalTo("from") {
(path, edge) => Path(path.from, edge.to)
}
.union(paths).distinct()
next
}
tc.print()
env.execute()

Iterate natively
21
partial
solution
partial
solutionX
other
datasets
Y
initial
solution
iteration
result
Replace
Step function

Iterate natively with deltas
22
partial
solution
delta
setX
other
datasets
Y
initial
solution
iteration
result
workset A B workset
Merge deltas
Replace
initial
workset

Effect of delta iterations
0
5000000
10000000
15000000
20000000
25000000
30000000
35000000
40000000
45000000
1 6 11 16 21 26 31 36 41 46 51 56 61
#ofelementsupdated
iteration

Iteration performance
24
MapReduce

Flink roadmap for 2015
 Unify batch and streaming
 Machine learning library and Mahout
 Graph processing library improvements
 Interactive programs and Zeppelin
 Logical queries and SQL
 And many more
26

Flink community
0
20
40
60
80
100
120
Aug-10 Feb-11 Sep-11 Apr-12 Oct-12 May-13 Nov-13 Jun-14 Dec-14 Jul-15
#unique contributors by git commits
(without manual de-dup)

January 2015 HUG: Apache Flink: Fast and reliable large-scale data processing

More Related Content

What's hot

Viewers also liked

Similar to January 2015 HUG: Apache Flink: Fast and reliable large-scale data processing

More from Yahoo Developer Network

Recently uploaded

January 2015 HUG: Apache Flink: Fast and reliable large-scale data processing

Editor's Notes