Make Stream Processing
Towards ANSI SQL
Shaoxuan Wang
Alibaba Group
2018.6.20
Broadcom
High-Perf Platform
Facebook
Social Graph Storage
Alibaba Group
Real-Time Data Infra
Peking University
EECS
University of California
at San Diego
Computer Engineer
Flink Committer
Since 2017
Shaoxuan Wang
Alibaba Group
wshaoxuan@gmail.com
shaoxuan@apache.org
01 ANSI SQL for Stream Processing
02 Blink SQL Engine
03 Blink SQL Optimization
01 ANSI SQL for Stream
Processing
OptimizedDeclarativeUnderstandable Stable
One Query, Same Result
Unify
Why SQL?
real-time
return one final result
correctness
emit results as early as possible
Batch versus Stream Processing
Batch Processing Stream Processing
VS
in stream processing, it emits intermediate results, and
keeps refining the results to ensure correctness
VS
WHAT & HOW: results are calculated
WHEN: to emit a (intermedia) result
HOW: to refine the results
ANSI SQL can Describe Stream Processing
Can be fully described by SQL
Does not affect business logic
Can be solved by SQL engine
Describe a Stream Processing
Stream
Dynamic TableApply
Changelog
user clicks
user clicks
Mary 1
Bob 1
Mary 2
Liz 1
Bob 2
Mary 3
Mary 1
Bob 1
Mary 2
LIz 1
Bob 2
Mary 3
Stream
Dynamic Table
Apply
Changelog
Introducing Dynamic Table
Stream-Table Duality
user url
clicks
Mary ./home
user cnt
Mary 1
result
Bob 1
Liz 1
Mary 2
Liz 2
Mary, 1
Bob, 1
Mary, 2
Liz, 1
Mary, 3
Liz, 2
SELECT
user,
COUNT(url) as cnt
FROM clicks
GROUP BY user
Mary ./prod?id=7
Bob ./cart
Liz ./prod?id=3
Liz ./home
Mary ./prod?id=1
Dynamic Table Dynamic Table Output Stream
Mary, ./home
Bob, ./cart
Mary, ./prod?id=1
Liz, ./home
Mary, ./prod?id=7
Liz, ./prod?id=3
Input Stream
Mary 3
Continuous SQL Query on Dynamic Table
StreamStream
Continuous
SQL Query
Incorrect! This value
should be 2
Retraction for RefinementResult Refinement can be very Complex
RetractionIntroducing Retraction
Retraction is not cost-free:
1. Events are doubled
2. Operators can be complex when
consider handling retraction (e.g.
max/min aggregate)
You should not reason about retraction. Just write simple
queries, SQL engine will ensure the correctness.
02 Blink SQL Engine
Introducing Alibaba Blink
Blink1.0:
enterprise edition of Flink
with lots of improvements
designed by AlibabaApache Flink
Alibaba’s Improvements
Blink2.0:
a new unified high performance compute engine for
complete data applications
Introducing Blink
Runtime
DAG API & Operators
Query Processor
Query Optimizer & Query Executor
SQL & Table API
Relational
Local
Single JVM
Cloud
GCE, EC2
Cluster
Standalone, YARN
SQL
& TableAPI
Logical
Plan
Physical
Plan
Execution
DAG
completely same between batch & stream processing
Optimizer
stream processing has some unique design
Same Results
Batch mode
Same SQL Query
Stream Mode
Architecture of Blink SQL Engine
• ANSI SQL
• Major data types (numeric, varchar, binary, decimal, array, map)
• UDF/UDTF/UDAF
• Support all types of join (inner/left/right/full/semi/anti)
• Support over window, grouping window (tumbling, sliding, session)
• Various subquery supported (correlated/uncorrelated)
• Advanced analysis (grouping set, cube, rollup…)
stream processing with Blink SQL can fully pass TPCH, and results are
same as batch processing
Blink SQL Functionalities
03 Blink SQL
Optimization
Predicate, Projection push-down
Sort related rules
State (MapState/ValueState)
Retraction
EMIT SLA -> MicroBatch
Joining Reorder
Batch Processing Stream Processing
VS
Same as batch
Collect stats in different ways
Stream has unique designNot useful for stream
Challenges & Opportunities for Stream Processing
25xJoin on
custID
Customer
150million
Order
1.5billion
HashJoin
Batch Processing Stream Processing
Join on
custID
Customer
150million
Order
1.5billion
ValueState MapState CountAgg
Join on
custID
Customer
150million
Order
1.5billion
ValueState ValueState
100million
CountAgg CountAgg
PK:custID PK:orderID PK:custID PK:orderID
PK:custID
PK:orderID
Stream Processing TPCH13:
StateIO-Cost Plays a Big Role on Plan Choosing
Agg
(MaxWithRetract)
Calc
Agg
(Sum)
lineitem
Agg
(Max)
Calc
Agg
(Sum)
lineitem
Result of sum
is ascending
15x
Input value is
unsigned type
Stream Processing TPCH15:
Removing Retraction Operation can Significantly Improve Performance
Simple
Aggregation
(forwarding) Local-
Global Aggregation
1 3 2 7 5
1 4 3 8 6
1 3 2 9 5
7 6 5
5
9 8
1 3 2 7 5
1 4 3 8 6
1 3 2 9 5
1 17
4 18
5 15
25 18 17
5 4 1
SUM
Local-Global Agg to Improve Data Skew
(forwarding) Local-Global
Aggregation
B,2
B,2
A,4
A,2
A,3
A,1
A,2
B,2
A,2
A,4
A,2
A,4
A,2
A,4
A,1
B,1
A,1
A,3
A,2
A,4
B,2 A,4
A,2
B,1 A,2
A,4
A,1
Map
Map
Map
Local
Agg
Local
Agg
Local
Agg
Global
Agg
Global
Agg
Local
Agg
B,2
A,4
A,2
A,3
A,1
A,2
B,2
A,2
A,4
A,2
A,4
A,2
A,4
A,1
B,1
B,1A,1
B,2A,2
A,3
A,4
(keyed-shuffle) Local-Global
Aggregation
Count
Distinct
Local-Global Agg to Improve Data Skew
SQL Query Optimizer
SQL Query Executor
State Storage Engine
Runtime – OS
Resource Conf
10x, 100x, …...
10x
10x
<10x
<10x
Performance Tuning for Stream Processing
Structured Streaming @
Processing 100s billion records/hour
1000s of customer streaming apps
in production on Alicloud
Largest app has 1000s of subtasks
and 10s of TB state
Blink Platform (e.x. Alicloud StreamCompute)
• Stream processing can be described by ANSI SQL
• Alibaba Blink SQL follows ANSI SQL
• SQL Optimization of stream processing faces new challenges
and opportunities
• Alibaba Blink Platform (e.x. Alicloud StreamCompute)
operators world largest stream processing businesses
Take Away
Thanks
Shaoxuan wang
wshaoxuan@gmail.com
shaoxuan@apache.org
2018.6.20
We are Hiring!
Hangzhou / Beijing, China
Seattle / Bay Area, US
blink-jobs@list.alibaba-inc.com

Make streaming processing towards ANSI SQL

  • 1.
    Make Stream Processing TowardsANSI SQL Shaoxuan Wang Alibaba Group 2018.6.20
  • 2.
    Broadcom High-Perf Platform Facebook Social GraphStorage Alibaba Group Real-Time Data Infra Peking University EECS University of California at San Diego Computer Engineer Flink Committer Since 2017 Shaoxuan Wang Alibaba Group wshaoxuan@gmail.com shaoxuan@apache.org
  • 3.
    01 ANSI SQLfor Stream Processing 02 Blink SQL Engine 03 Blink SQL Optimization
  • 4.
    01 ANSI SQLfor Stream Processing
  • 5.
  • 6.
    real-time return one finalresult correctness emit results as early as possible Batch versus Stream Processing Batch Processing Stream Processing VS in stream processing, it emits intermediate results, and keeps refining the results to ensure correctness VS
  • 7.
    WHAT & HOW:results are calculated WHEN: to emit a (intermedia) result HOW: to refine the results ANSI SQL can Describe Stream Processing Can be fully described by SQL Does not affect business logic Can be solved by SQL engine Describe a Stream Processing
  • 8.
    Stream Dynamic TableApply Changelog user clicks userclicks Mary 1 Bob 1 Mary 2 Liz 1 Bob 2 Mary 3 Mary 1 Bob 1 Mary 2 LIz 1 Bob 2 Mary 3 Stream Dynamic Table Apply Changelog Introducing Dynamic Table Stream-Table Duality
  • 9.
    user url clicks Mary ./home usercnt Mary 1 result Bob 1 Liz 1 Mary 2 Liz 2 Mary, 1 Bob, 1 Mary, 2 Liz, 1 Mary, 3 Liz, 2 SELECT user, COUNT(url) as cnt FROM clicks GROUP BY user Mary ./prod?id=7 Bob ./cart Liz ./prod?id=3 Liz ./home Mary ./prod?id=1 Dynamic Table Dynamic Table Output Stream Mary, ./home Bob, ./cart Mary, ./prod?id=1 Liz, ./home Mary, ./prod?id=7 Liz, ./prod?id=3 Input Stream Mary 3 Continuous SQL Query on Dynamic Table StreamStream Continuous SQL Query
  • 10.
    Incorrect! This value shouldbe 2 Retraction for RefinementResult Refinement can be very Complex
  • 11.
    RetractionIntroducing Retraction Retraction isnot cost-free: 1. Events are doubled 2. Operators can be complex when consider handling retraction (e.g. max/min aggregate) You should not reason about retraction. Just write simple queries, SQL engine will ensure the correctness.
  • 12.
  • 13.
    Introducing Alibaba Blink Blink1.0: enterpriseedition of Flink with lots of improvements designed by AlibabaApache Flink Alibaba’s Improvements Blink2.0: a new unified high performance compute engine for complete data applications Introducing Blink
  • 14.
    Runtime DAG API &Operators Query Processor Query Optimizer & Query Executor SQL & Table API Relational Local Single JVM Cloud GCE, EC2 Cluster Standalone, YARN SQL & TableAPI Logical Plan Physical Plan Execution DAG completely same between batch & stream processing Optimizer stream processing has some unique design Same Results Batch mode Same SQL Query Stream Mode Architecture of Blink SQL Engine
  • 15.
    • ANSI SQL •Major data types (numeric, varchar, binary, decimal, array, map) • UDF/UDTF/UDAF • Support all types of join (inner/left/right/full/semi/anti) • Support over window, grouping window (tumbling, sliding, session) • Various subquery supported (correlated/uncorrelated) • Advanced analysis (grouping set, cube, rollup…) stream processing with Blink SQL can fully pass TPCH, and results are same as batch processing Blink SQL Functionalities
  • 16.
  • 17.
    Predicate, Projection push-down Sortrelated rules State (MapState/ValueState) Retraction EMIT SLA -> MicroBatch Joining Reorder Batch Processing Stream Processing VS Same as batch Collect stats in different ways Stream has unique designNot useful for stream Challenges & Opportunities for Stream Processing
  • 18.
    25xJoin on custID Customer 150million Order 1.5billion HashJoin Batch ProcessingStream Processing Join on custID Customer 150million Order 1.5billion ValueState MapState CountAgg Join on custID Customer 150million Order 1.5billion ValueState ValueState 100million CountAgg CountAgg PK:custID PK:orderID PK:custID PK:orderID PK:custID PK:orderID Stream Processing TPCH13: StateIO-Cost Plays a Big Role on Plan Choosing
  • 19.
    Agg (MaxWithRetract) Calc Agg (Sum) lineitem Agg (Max) Calc Agg (Sum) lineitem Result of sum isascending 15x Input value is unsigned type Stream Processing TPCH15: Removing Retraction Operation can Significantly Improve Performance
  • 20.
    Simple Aggregation (forwarding) Local- Global Aggregation 13 2 7 5 1 4 3 8 6 1 3 2 9 5 7 6 5 5 9 8 1 3 2 7 5 1 4 3 8 6 1 3 2 9 5 1 17 4 18 5 15 25 18 17 5 4 1 SUM Local-Global Agg to Improve Data Skew
  • 21.
    (forwarding) Local-Global Aggregation B,2 B,2 A,4 A,2 A,3 A,1 A,2 B,2 A,2 A,4 A,2 A,4 A,2 A,4 A,1 B,1 A,1 A,3 A,2 A,4 B,2 A,4 A,2 B,1A,2 A,4 A,1 Map Map Map Local Agg Local Agg Local Agg Global Agg Global Agg Local Agg B,2 A,4 A,2 A,3 A,1 A,2 B,2 A,2 A,4 A,2 A,4 A,2 A,4 A,1 B,1 B,1A,1 B,2A,2 A,3 A,4 (keyed-shuffle) Local-Global Aggregation Count Distinct Local-Global Agg to Improve Data Skew
  • 22.
    SQL Query Optimizer SQLQuery Executor State Storage Engine Runtime – OS Resource Conf 10x, 100x, …... 10x 10x <10x <10x Performance Tuning for Stream Processing
  • 23.
    Structured Streaming @ Processing100s billion records/hour 1000s of customer streaming apps in production on Alicloud Largest app has 1000s of subtasks and 10s of TB state Blink Platform (e.x. Alicloud StreamCompute)
  • 24.
    • Stream processingcan be described by ANSI SQL • Alibaba Blink SQL follows ANSI SQL • SQL Optimization of stream processing faces new challenges and opportunities • Alibaba Blink Platform (e.x. Alicloud StreamCompute) operators world largest stream processing businesses Take Away
  • 25.
    Thanks Shaoxuan wang wshaoxuan@gmail.com shaoxuan@apache.org 2018.6.20 We areHiring! Hangzhou / Beijing, China Seattle / Bay Area, US blink-jobs@list.alibaba-inc.com