Streaming Event-Time
Partitioning With Apache
Flink And Apache Iceberg
Julia Bennett
Flink Forward
October 2019
What is this talk about?
● Story of building event-time partitioning in
Flink and Iceberg
○ Netflix playback data
○ Why event-time partitioning matters
○ Introduce Iceberg
○ Implementation details
○ Tradeoffs
Playback
Data
Processing Time:
2019-08-15, 11:53
Event Time:
2019-08-15, 9:17
Event
Data
2019-08-15, 11:00
2019-08-15, 10:00
2019-08-15, 9:00
2019-08-15, 8:00
Table Partitioning
Processing Time?
Event Time?✔
Before: Playback Data
Event
Time
Streaming (Near Real-Time) Batch (Hourly)
Process
Time
Raw Events
Session
Summaries
Business
Logic
**Lots of late events
Common Pattern
Streaming (Near Real-Time) Batch (Hourly)
Event
Time
Process
Time
Business
Logic
?
Better Pattern?
Streaming (Near Real-Time)
Event
Time
RouterBusiness
Logic
?
Why is this hard?
● Exactly once processing
○ Dedupe incoming stream
○ Append to table without duplicates (or loss)
● Out of order and late arriving events
○ Partition-level modifications
○ Efficiently modify long tail of partitions
Iceberg Sink
● Apache Iceberg (Incubating) is new table
format developed at Netflix
○ Atomic commits and file-level changes
● Flink + Iceberg: Exactly once partition appends
○ Buffers records into partitioned files
○ Commits with each checkpoint
This should be easy...
Event
Time
Router
.sink(Iceberg)
Problem: Too many files
Oh, and dedupe too
(… e.g. 250 MILLION per day)
Source
Dynamic
Partitioner
Traffic Metrics
State
Iceberg
Sink
Late Events
State
Late Batching
Punctual Stream
Dedupe Key
State
Dedupe
Reduces # Files & Skew
Offline
Compaction
Dedupe Key
State
Dedupe
Keyed Process
Function
+ Timer Service
Dynamic
Partitioner
Traffic Metrics
State
Side Output
+ Window
+ Broadcast
Late Events
State
Late Batching
Punctual Stream
Keyed Process
Function
+ Timer Service
After: Playback Data
Streaming (Near Real-Time)
Event
Time
RouterBusiness
Logic
Raw Events
Session
Summaries
~100K Files
Per Day
Source
Dynamic
Partitioner
Traffic Metrics
State
Iceberg
Sink
Offline
Compaction
Late Events
State
Late Batching
Punctual Stream
Dedupe Key
State
Dedupe
-Generic event-time partitioning out of the box -
New Pattern
Streaming (Near Real-Time)
Event
Time
RouterBusiness
Logic
?
Tradeoffs
● Low-latency streaming output
● Remove operation of offline batch job
● Failure recovery is hard
○ Multiple long-lived states
○ Writing directly to production table
● Complexity, even if abstracted away
Team Effort
Lokesh
Balakrishnan
And next time...
Event
Time
Router
.sink(Iceberg)

Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Julia Bennett, Netflix