Integrate Flink with Kubernetes
natively
Yang Wang
Software Engineer @ Alibaba
Agenda
● Kubernetes introduction
● Evolution of Flink on Kubernetes
● Deep dive into the technicals
● Demo
● Production optimizations
Agenda
● Kubernetes introduction
● Evolution of Flink on Kubernetes
● Deep dive into the technicals
● Demo
● Production optimizations
What is Kubernetes?
Kubernetes (K8s) is an open-source system for automating deployment, scaling,
and management of containerized applications.
• Resource management
• Container orchestration
• Operation automation
• Cloud native
2000 2006
2011
2013 20172015
2018
2014
Kubernetes Architecture
Kubernetes Master
API Server
Scheduler
ETCD
Controllers
Node1
Kube proxyPlugins
Kubelet Docker
Pod
Pod
Pod
Container
Container
Node 2
Docker Registry
API
UI
DashBoard
CLI
kubectl
• ConfigMap is a dictionary of configuration settings. This dictionary consists of
key-value pairs of strings.
• Service is an abstract way to expose an application running on a set of Pods
as a network service.
• Pod, smallest deployable unit, consist of one or more containers.
• Deployment is a higher-level abstraction to manage a set of identical pods.
Kubernetes Concepts
More and more workloads run on Kubernetes
Com puting
ECS, EBM, GPU, FPGA, ECI
Network
VPC, ENI, RDMA, SLB, DNS
Public Cloud Edge Computing Private Cloud
Storage
EBS, NAS, CPFS, OSS
Kubernetes (EKS @ Amazon, ACK @ Alibaba Cloud, GKE @ Google, etc.)
Web/mobile applications
 Stateless
 Idempotent
 Horizontal scalable
Kafka
Elastic
Search
Tensor
Flow
Spark FlinkRedismysql
Agenda
● Kubernetes introduction
● Evolution of Flink on Kubernetes
● Deep dive into the technicals
● Demo
● Production optimizations
• Container environment, easy to setup, cleanup, reproduce
• Multiple tenants, better resource/network isolation, security
• Mixed workloads, running beside with online webservices, machine learning,
search engine, etc. to get better resource utilization
• Leverage the rich Kubernetes ecosystem, e.g. logging, monitoring, etc.
Why Flink on Kubernetes?
How to work together?+
Standalone session on Kubernetes
Flink Master Deployment
2
7
3
65
2
4
4
Kubectl
K8s Master
Dispatcher K8sResMngr
JobMaster
Deployment
ConfigMap
TaskManager
TaskManager
SVC
1
2
8
Flink Client
2
TaskManager
Deployment
• Standalone Flink cluster
• No efforts to change Flink
• Static resource
Standalone perjob on Kubernetes
• User jar and dependencies are built in the image
• Start a dedicated Flink cluster for each job
• One step submission
• User main run in the cluster
Standalone
JobCluster
EntryPoint
Dispatcher
ResourceManager
JobMaster
Retrieve JobGraph
from classpath Recover Job
Helm is the first application package manager running atop Kubernetes.
Helm chart
Flink Kubernetes Operator
Kubernetes Operator
• Easy to use
• Multiple Flink clusters management
• Application whole lifecycle management
- Restart, upgrade
• Each Flink application runs a single job
• Picture from https://github.com/lyft/flinkk8soperator
• lyft/flinkk8soperator
• GoogleCloudPlatform/flink-on-k8s-operator
• Flink is not aware of Kubernetes cluster
• Static resource allocation
• Users require some upfront knowledge about containers, operators and environment-
specific tools like “kubectl”
• Not convenient for batch job and multiple jobs in session
Now it’s time for native integration. We are NOT going to
• Replace standalone on K8s
• Replace flink-k8s-operator
Current limitation
Agenda
● Kubernetes introduction
● Evolution of Flink on Kubernetes
● Deep dive into the technicals
● Demo
● Production optimizations
What does native mean?
• Self contained
- Embedded K8s client inside
- Do not need external tools to start/stop Flink cluster
• Flink client natively contact to Kubernetes API server to create JobManager
• Flink ResourceManager natively contact to Kubernetes to create
TaskManager Pod on demand
- Similar to YARN, Mesos integration
Native Kubernetes session
Flink Master Deployment
2
9
8
54
7
33
Flink Client
K8s Master
Dispatcher K8sResMngr
JobMaster
Pod
ConfigMap
TaskManager
Pod
TaskManager
SVC
K8s Client
1
2
10
Flink
DashBoard
Docker RegistryDistributedStorage(HDFS, S3)
2
6
Native Kubernetes perjob
Flink Master Deployment
2
9
8
5
4
7
Flink Client
K8s Master
Dispatcher K8sResMngr
JobMaster
Pod
ConfigMap
TaskManager
Pod
TaskManager
SVC
K8s Client
1
2
10
Flink
DashBoard
Docker RegistryDistributedStorage(HDFS, S3)
2
6
Cluster
Entrypoint
3
3
Session VS Perjob
• Where the user main code is executed?
- Session: Client
- Perjob: Cluster
• How the job graph and user jars are distributed?
- Session: Upload via rest client and localized by Flink distributed cache
- Perjob: Built-in the image or downloaded by init container
• Isolation between different jobs
• Cluster lifecycle
• Session: Manually start stop
• Perjob: Bound to the only one job
Let’s have a taste.
Demo: native Kubernetes session and perjob cluster
Agenda
● Kubernetes introduction
● Evolution of Flink on Kubernetes
● Deep dive into the technicals
● Demo
● Production optimizations
Native Kubernetes high availability
Job Manager
(Leader)
Lock for contending
leader
(ConfigMap)
Leader RPC address
(ConfigMap)
Kubernetes API Server
Task Manager
retrieve and watch
Job Statuses
(ConfigMap)
Job Graph Meta
(ConfigMap)
Checkpoint Meta
(ConfigMap)
Job Graphs Checkpoints
Task Manager
Job Manager
(Standby)
Job Manager
(Standby)
...
...
filereferences
Deployment
Pod
High availability
• No external dependency
• Multiple JobManager
• Fast recovery
• Session and per-job
DistributedStorage(HDFS, S3, OSS)
Log and metrics
Kubernetes Pod
Flink TaskManager
Metrics
Resporter
Shared Volume
Sidecar Container(fluentd, etc.)
Promethus
HDFS/S3/ES
Log4j2
Appender
• Log4j2 Appender
• Sidecar Container
• DaemonSet
• Push gateway
• Multiple replica
Logging collector
Metrics collector
Network
Network plugin
• Flannel
• Host
• Cloud CNI plugin
- AWS
- Azure
- Alibaba Cloud
- Google Cloud
- …
● Kubernetes Session
- FLINK-9953
- Released in 1.10
● Kubernetes per-job
- FLINK-10934
- In progress, planned in 1.11
● Native high availability
- FLINK-12884
- Internal implementation, will
contribute to community soon
Current State
● Advanced features
- FLINK-14460
- All have been planned in 1.11
- Label, annotation, node-selector
- Toleration
- Sidecar container
- Init container
- Pod template
- …
Thanks and question please!

Virtual Flink Forward 2020: Integrate Flink with Kubernetes natively - Yang Wang

  • 1.
    Integrate Flink withKubernetes natively Yang Wang Software Engineer @ Alibaba
  • 2.
    Agenda ● Kubernetes introduction ●Evolution of Flink on Kubernetes ● Deep dive into the technicals ● Demo ● Production optimizations
  • 3.
    Agenda ● Kubernetes introduction ●Evolution of Flink on Kubernetes ● Deep dive into the technicals ● Demo ● Production optimizations
  • 4.
    What is Kubernetes? Kubernetes(K8s) is an open-source system for automating deployment, scaling, and management of containerized applications. • Resource management • Container orchestration • Operation automation • Cloud native 2000 2006 2011 2013 20172015 2018 2014
  • 5.
    Kubernetes Architecture Kubernetes Master APIServer Scheduler ETCD Controllers Node1 Kube proxyPlugins Kubelet Docker Pod Pod Pod Container Container Node 2 Docker Registry API UI DashBoard CLI kubectl
  • 6.
    • ConfigMap isa dictionary of configuration settings. This dictionary consists of key-value pairs of strings. • Service is an abstract way to expose an application running on a set of Pods as a network service. • Pod, smallest deployable unit, consist of one or more containers. • Deployment is a higher-level abstraction to manage a set of identical pods. Kubernetes Concepts
  • 7.
    More and moreworkloads run on Kubernetes Com puting ECS, EBM, GPU, FPGA, ECI Network VPC, ENI, RDMA, SLB, DNS Public Cloud Edge Computing Private Cloud Storage EBS, NAS, CPFS, OSS Kubernetes (EKS @ Amazon, ACK @ Alibaba Cloud, GKE @ Google, etc.) Web/mobile applications  Stateless  Idempotent  Horizontal scalable Kafka Elastic Search Tensor Flow Spark FlinkRedismysql
  • 8.
    Agenda ● Kubernetes introduction ●Evolution of Flink on Kubernetes ● Deep dive into the technicals ● Demo ● Production optimizations
  • 9.
    • Container environment,easy to setup, cleanup, reproduce • Multiple tenants, better resource/network isolation, security • Mixed workloads, running beside with online webservices, machine learning, search engine, etc. to get better resource utilization • Leverage the rich Kubernetes ecosystem, e.g. logging, monitoring, etc. Why Flink on Kubernetes? How to work together?+
  • 10.
    Standalone session onKubernetes Flink Master Deployment 2 7 3 65 2 4 4 Kubectl K8s Master Dispatcher K8sResMngr JobMaster Deployment ConfigMap TaskManager TaskManager SVC 1 2 8 Flink Client 2 TaskManager Deployment • Standalone Flink cluster • No efforts to change Flink • Static resource
  • 11.
    Standalone perjob onKubernetes • User jar and dependencies are built in the image • Start a dedicated Flink cluster for each job • One step submission • User main run in the cluster Standalone JobCluster EntryPoint Dispatcher ResourceManager JobMaster Retrieve JobGraph from classpath Recover Job
  • 12.
    Helm is thefirst application package manager running atop Kubernetes. Helm chart
  • 13.
    Flink Kubernetes Operator KubernetesOperator • Easy to use • Multiple Flink clusters management • Application whole lifecycle management - Restart, upgrade • Each Flink application runs a single job • Picture from https://github.com/lyft/flinkk8soperator • lyft/flinkk8soperator • GoogleCloudPlatform/flink-on-k8s-operator
  • 14.
    • Flink isnot aware of Kubernetes cluster • Static resource allocation • Users require some upfront knowledge about containers, operators and environment- specific tools like “kubectl” • Not convenient for batch job and multiple jobs in session Now it’s time for native integration. We are NOT going to • Replace standalone on K8s • Replace flink-k8s-operator Current limitation
  • 15.
    Agenda ● Kubernetes introduction ●Evolution of Flink on Kubernetes ● Deep dive into the technicals ● Demo ● Production optimizations
  • 16.
    What does nativemean? • Self contained - Embedded K8s client inside - Do not need external tools to start/stop Flink cluster • Flink client natively contact to Kubernetes API server to create JobManager • Flink ResourceManager natively contact to Kubernetes to create TaskManager Pod on demand - Similar to YARN, Mesos integration
  • 17.
    Native Kubernetes session FlinkMaster Deployment 2 9 8 54 7 33 Flink Client K8s Master Dispatcher K8sResMngr JobMaster Pod ConfigMap TaskManager Pod TaskManager SVC K8s Client 1 2 10 Flink DashBoard Docker RegistryDistributedStorage(HDFS, S3) 2 6
  • 18.
    Native Kubernetes perjob FlinkMaster Deployment 2 9 8 5 4 7 Flink Client K8s Master Dispatcher K8sResMngr JobMaster Pod ConfigMap TaskManager Pod TaskManager SVC K8s Client 1 2 10 Flink DashBoard Docker RegistryDistributedStorage(HDFS, S3) 2 6 Cluster Entrypoint 3 3
  • 19.
    Session VS Perjob •Where the user main code is executed? - Session: Client - Perjob: Cluster • How the job graph and user jars are distributed? - Session: Upload via rest client and localized by Flink distributed cache - Perjob: Built-in the image or downloaded by init container • Isolation between different jobs • Cluster lifecycle • Session: Manually start stop • Perjob: Bound to the only one job
  • 20.
    Let’s have ataste. Demo: native Kubernetes session and perjob cluster
  • 21.
    Agenda ● Kubernetes introduction ●Evolution of Flink on Kubernetes ● Deep dive into the technicals ● Demo ● Production optimizations
  • 22.
    Native Kubernetes highavailability Job Manager (Leader) Lock for contending leader (ConfigMap) Leader RPC address (ConfigMap) Kubernetes API Server Task Manager retrieve and watch Job Statuses (ConfigMap) Job Graph Meta (ConfigMap) Checkpoint Meta (ConfigMap) Job Graphs Checkpoints Task Manager Job Manager (Standby) Job Manager (Standby) ... ... filereferences Deployment Pod High availability • No external dependency • Multiple JobManager • Fast recovery • Session and per-job DistributedStorage(HDFS, S3, OSS)
  • 23.
    Log and metrics KubernetesPod Flink TaskManager Metrics Resporter Shared Volume Sidecar Container(fluentd, etc.) Promethus HDFS/S3/ES Log4j2 Appender • Log4j2 Appender • Sidecar Container • DaemonSet • Push gateway • Multiple replica Logging collector Metrics collector
  • 24.
    Network Network plugin • Flannel •Host • Cloud CNI plugin - AWS - Azure - Alibaba Cloud - Google Cloud - …
  • 25.
    ● Kubernetes Session -FLINK-9953 - Released in 1.10 ● Kubernetes per-job - FLINK-10934 - In progress, planned in 1.11 ● Native high availability - FLINK-12884 - Internal implementation, will contribute to community soon Current State ● Advanced features - FLINK-14460 - All have been planned in 1.11 - Label, annotation, node-selector - Toleration - Sidecar container - Init container - Pod template - …
  • 26.

Editor's Notes

  • #14 Operators extend Kubernetes functionality Operators systematize human knowledge as code
  • #25 Amazon EKS supports native VPC networking via the Amazon VPC CNI plugin for Kubernetes. Using this CNI plugin allows Kubernetes pods to have the same IP address inside the pod as they do on the VPC network. This CNI plugin is an open-source project that is maintained on GitHub https://docs.aws.amazon.com/eks/latest/userguide/pod-networking.html Elastic network interface and secondary IP address limitations by Amazon EC2 instance types are applicable. In general, larger instances can support more IP addresses. For more information, see IP Addresses Per Network Interface Per Instance Type in the Amazon EC2 User Guide for Linux Instances.