Kubernetes Networking with Cilium
Deep Dive
Michal Rostecki
Software Engineer
mrostecki@suse.de mrostecki@opensuse.org
22
BPF
3
What is BPF?
4
Linux network has many abstraction layers
Application Layer
System Call Interface
Sockets
Protocols
TCP UDP
Traffic Shaping
sk_buff
Network drivers
5
BPF allows to hook into them
Application Layer
System Call Interface
Sockets
Protocols
TCP UDP
Traffic Shaping
sk_buff
Network drivers
XDP – DMA to the NIC
BPF – after kernel parses the packet
BPF – System Call tracing
BPF – sockmap, sockops
6
BPF goes into firewalls
0
10
20
30
40
50
60
70
iptables
nftables
bpfilter (host driver)
bpfilter (hardware offload)
Mpps
7
BPF goes into...
●
Load balancers - katran
●
perf
●
systemd
●
Suricata
●
Open vSwitch - AF_XDP
●
And many many others
88
Cilium
9
What is Cilium?
10
Cilium as CNI plugin
Node A
Pod A
Cilium + BPF
Node B
Cilium + BPF
Container
eth0
Pod B
Container
eth0
Pod C
Container
eth0
11
Networking modes
Use case:
Cilium handling routing between nodes
Encapsulation
Use case:
Using cloud provider routers, using BGP
routing daemon
Direct routing
Node A
Node B
Node C
VXLAN
VXLAN
VXLAN
Node A
Node B Node C
Cloud or BGP
routing
12
L3 filtering – label based, ingress
Pod
Labels: role=frontend
IP: 10.0.0.1
Pod
Labels: role=frontend
IP: 10.0.0.2
Pod
IP: 10.0.0.5
Pod
Labels: role=backend
IP: 10.0.0.3
Pod
Labels: role=frontend
IP: 10.0.0.4
allow
deny
13
L3 filtering – label based, ingress
apiVersion: "cilium.io/v2"
kind: CiliumNetworkPolicy
description: "Allow frontends to access backends"
metadata:
name: "frontend-backend"
spec:
endpointSelector:
matchLabels:
role: backend
ingress:
- fromEndpoints:
- matchLabels:
class: frontend
14
L3 filtering – CIDR based, egress
IP: 10.0.1.1
Subnet: 10.0.1.0/24
IP: 10.0.2.1
Subnet: 10.0.2.0/24
allow
deny
Cluster A
Pod
Labels: role=backend
IP: 10.0.0.1
Any IP not belonging
to 10.0.1.0/24
15
L3 filtering – CIDR based, egress
apiVersion: "cilium.io/v2"
kind: CiliumNetworkPolicy
description: "Allow backends to access 10.0.1.0/24"
metadata:
name: "frontend-backend"
spec:
endpointSelector:
matchLabels:
role: backend
egress:
- toCIDR:
- IP: “10.0.1.0/24”
16
L4 filtering
Pod
Labels: role=backend
IP: 10.0.0.1
allow
deny
TCP/80
Any other port
17
L4 filtering
apiVersion: "cilium.io/v2"
kind: CiliumNetworkPolicy
description: "Allow to access backends only on TCP/80"
metadata:
name: "frontend-backend"
spec:
endpointSelector:
matchLabels:
role: backend
ingress:
- toPorts:
- ports:
- port: “80”
protocol: “TCP”
18
L7 filtering – API Aware Security
Pod
Labels: role=api
IP: 10.0.0.1
GET /articles/{id}
GET /private
Pod
IP: 10.0.0.5
19
L7 filtering – API Aware Security
endpointSelector:
matchLabels:
role: backend
ingress:
- toPorts:
- ports:
- port: “80”
protocol: “TCP”
rules:
http:
- method: "GET"
path: "/article/$"
20
Standalone proxy, L7 filtering
Node A
Pod A
Cilium + BPF
Envoy
Generating BPF programs for
L7 filtering through libcilium.so
Node B
Pod B
Cilium + BPF
Envoy
Generating BPF programs for
L7 filtering through libcilium.so
Generating BPF
programs
for L3/L4 filtering
Generating BPF
programs
for L3/L4 filtering
21
Features
22
Cluster Mesh
Cluster A Cluster B
Node A
Pod A
Cilium + BPF
Node B
Cilium + BPF
Container
eth0
Pod B
Container
eth0
Pod C
Container
eth0
External etcd
23
Istio without Cilium
Node A
Pod A
App container
Application
socket
Envoy
Socket
eth0
loopback
CNI driver
Node B
Pod B
App container
Application
socket
Envoy
Socket
eth0
loopback
CNI driver
Here packets need to go
through the whole kernel
network abstraction.
Using TCP protocol.
Performance loss.
24
Istio with Cilium and sockmap
Node A
Pod A
App container
Application
socket
Cilium+BPF
Envoy
Socket
eth0
Node B
Pod B
App container
Application
socket
Cilium+BPF
Envoy
Socket
eth0
25
Kubernetes Services
●
Hash table.
BPF, Cilium
●
Linear list.
●
All rules in the chain have to be
replaced as a whole.
Iptables, kube-proxy
Key
Key
Key
Value
Value
Value
Rule 1
Rule 2
Rule n
...
Search O(1)
Insert O(1)
Delete O(1)
Search O(n)
Insert O(1)
Delete O(n)
26
Kubernetes Services – benchmark
1 100 1000 2000 2768
0
100
200
300
400
500
600
700
Cilium (BPF)
kube-proxy (iptables)
Number of services in cluster
usec
27
CNI chaining
Policy enforement, load balancing,
multi-cluster
IP allocation, configuring network
interface, encapsulation/routing
28
Native support for AWS ENI
2929
To sum it up
30
Why Cilium is awesome?
●
It makes disadvantages of iptables disappear. And always gets the best
from the Linux kernel.
●
Cluster Mesh / multi-cluster.
●
Makes Istio faster.
●
Offers L7 API Aware filtering as a Kubernetes resource.
●
Integrates with the other popular CNI plugins – Calico, Flannel, Weave,
Lyft, AWS CNI.
Kubernetes Networking with Cilium - Deep Dive

Kubernetes Networking with Cilium - Deep Dive

  • 1.
    Kubernetes Networking withCilium Deep Dive Michal Rostecki Software Engineer mrostecki@suse.de mrostecki@opensuse.org
  • 2.
  • 3.
  • 4.
    4 Linux network hasmany abstraction layers Application Layer System Call Interface Sockets Protocols TCP UDP Traffic Shaping sk_buff Network drivers
  • 5.
    5 BPF allows tohook into them Application Layer System Call Interface Sockets Protocols TCP UDP Traffic Shaping sk_buff Network drivers XDP – DMA to the NIC BPF – after kernel parses the packet BPF – System Call tracing BPF – sockmap, sockops
  • 6.
    6 BPF goes intofirewalls 0 10 20 30 40 50 60 70 iptables nftables bpfilter (host driver) bpfilter (hardware offload) Mpps
  • 7.
    7 BPF goes into... ● Loadbalancers - katran ● perf ● systemd ● Suricata ● Open vSwitch - AF_XDP ● And many many others
  • 8.
  • 9.
  • 10.
    10 Cilium as CNIplugin Node A Pod A Cilium + BPF Node B Cilium + BPF Container eth0 Pod B Container eth0 Pod C Container eth0
  • 11.
    11 Networking modes Use case: Ciliumhandling routing between nodes Encapsulation Use case: Using cloud provider routers, using BGP routing daemon Direct routing Node A Node B Node C VXLAN VXLAN VXLAN Node A Node B Node C Cloud or BGP routing
  • 12.
    12 L3 filtering –label based, ingress Pod Labels: role=frontend IP: 10.0.0.1 Pod Labels: role=frontend IP: 10.0.0.2 Pod IP: 10.0.0.5 Pod Labels: role=backend IP: 10.0.0.3 Pod Labels: role=frontend IP: 10.0.0.4 allow deny
  • 13.
    13 L3 filtering –label based, ingress apiVersion: "cilium.io/v2" kind: CiliumNetworkPolicy description: "Allow frontends to access backends" metadata: name: "frontend-backend" spec: endpointSelector: matchLabels: role: backend ingress: - fromEndpoints: - matchLabels: class: frontend
  • 14.
    14 L3 filtering –CIDR based, egress IP: 10.0.1.1 Subnet: 10.0.1.0/24 IP: 10.0.2.1 Subnet: 10.0.2.0/24 allow deny Cluster A Pod Labels: role=backend IP: 10.0.0.1 Any IP not belonging to 10.0.1.0/24
  • 15.
    15 L3 filtering –CIDR based, egress apiVersion: "cilium.io/v2" kind: CiliumNetworkPolicy description: "Allow backends to access 10.0.1.0/24" metadata: name: "frontend-backend" spec: endpointSelector: matchLabels: role: backend egress: - toCIDR: - IP: “10.0.1.0/24”
  • 16.
    16 L4 filtering Pod Labels: role=backend IP:10.0.0.1 allow deny TCP/80 Any other port
  • 17.
    17 L4 filtering apiVersion: "cilium.io/v2" kind:CiliumNetworkPolicy description: "Allow to access backends only on TCP/80" metadata: name: "frontend-backend" spec: endpointSelector: matchLabels: role: backend ingress: - toPorts: - ports: - port: “80” protocol: “TCP”
  • 18.
    18 L7 filtering –API Aware Security Pod Labels: role=api IP: 10.0.0.1 GET /articles/{id} GET /private Pod IP: 10.0.0.5
  • 19.
    19 L7 filtering –API Aware Security endpointSelector: matchLabels: role: backend ingress: - toPorts: - ports: - port: “80” protocol: “TCP” rules: http: - method: "GET" path: "/article/$"
  • 20.
    20 Standalone proxy, L7filtering Node A Pod A Cilium + BPF Envoy Generating BPF programs for L7 filtering through libcilium.so Node B Pod B Cilium + BPF Envoy Generating BPF programs for L7 filtering through libcilium.so Generating BPF programs for L3/L4 filtering Generating BPF programs for L3/L4 filtering
  • 21.
  • 22.
    22 Cluster Mesh Cluster ACluster B Node A Pod A Cilium + BPF Node B Cilium + BPF Container eth0 Pod B Container eth0 Pod C Container eth0 External etcd
  • 23.
    23 Istio without Cilium NodeA Pod A App container Application socket Envoy Socket eth0 loopback CNI driver Node B Pod B App container Application socket Envoy Socket eth0 loopback CNI driver Here packets need to go through the whole kernel network abstraction. Using TCP protocol. Performance loss.
  • 24.
    24 Istio with Ciliumand sockmap Node A Pod A App container Application socket Cilium+BPF Envoy Socket eth0 Node B Pod B App container Application socket Cilium+BPF Envoy Socket eth0
  • 25.
    25 Kubernetes Services ● Hash table. BPF,Cilium ● Linear list. ● All rules in the chain have to be replaced as a whole. Iptables, kube-proxy Key Key Key Value Value Value Rule 1 Rule 2 Rule n ... Search O(1) Insert O(1) Delete O(1) Search O(n) Insert O(1) Delete O(n)
  • 26.
    26 Kubernetes Services –benchmark 1 100 1000 2000 2768 0 100 200 300 400 500 600 700 Cilium (BPF) kube-proxy (iptables) Number of services in cluster usec
  • 27.
    27 CNI chaining Policy enforement,load balancing, multi-cluster IP allocation, configuring network interface, encapsulation/routing
  • 28.
  • 29.
  • 30.
    30 Why Cilium isawesome? ● It makes disadvantages of iptables disappear. And always gets the best from the Linux kernel. ● Cluster Mesh / multi-cluster. ● Makes Istio faster. ● Offers L7 API Aware filtering as a Kubernetes resource. ● Integrates with the other popular CNI plugins – Calico, Flannel, Weave, Lyft, AWS CNI.