When One Data Center is not Enough
Guozhang Wang Strata San Jose, 2016
Building large-scale stream infrastructure across multiple data centers with Apache Kafka
2
• Why across Data Centers?
• Design patterns for Multi-DC
• Kafka for Multi-DC
• Conclusion
Agenda
3
Why across Data Centers?
4
Why across Data Centers
• Catastrophic / expected failures
• Routine maintenance
• Geo-locality (Example: CDNs)
5
Why NOT across Data Centers
• Low bandwidth (10Mbps - 1Gbps)
• High latency (50ms - 450ms)
• Much More $$$
6
Why NOT across Data Centers
• … is hard and expensive
7
Why NOT across Data Centers
• … is hard and expensive
• … with real-time writes? Harder
8
Why NOT across Data Centers
• … is hard and expensive
• … with real-time writes? Harder
• … consistently? Oh My!
9
Consistency
• Weak
• Eventual
• Strong
Latency Guarantee
10
Weak No Consistency
• Now you see my writes, now you don’t
• Best effort only, data can be stale
• Examples: think of “caches”, VoIP
11
Eventual Consistency
• You will see my writes, … eventually
• May need to resolve conflicts (manually)
• Examples: think of “emails”, SMTP
12
Strong Consistency
• You get what you write, for sure
• External > Sequential > Causal (Session)
• Examples: RDBMS, file systems
13
• LAN: consistency over latency
• WAN: latency over consistency
Latency vs. Consistency
14
• Why across Data Centers?
• Design patterns for Multi-DC
• Kafka for Multi-DC
• Conclusion
Agenda
15
Option I: Don’t do it
• Bunkerize the single data center
• Expect data loss at failures
• Examples: ??
16
Option II: Primary with Hot Standby
• Failover to hot standby (maybe inconsistent)
• Window of data loss at failures
• Examples: MySQL binlog
17
Option III: Active-Active
• Accepts writes in multi-DC
• Resolve conflicts (strong / week consistency)
• Examples: Amazon DynamoDB (vector clock)
Google Spanner (2PC), Mesa (Paxos)
18
Ordering is the Key!
19
Ordering is Key
• Vector clocks: partial ordering
• Paxos, 2PC: global ordering
• Log shipping: logical ordering (per-partition)
21
Apache Kafka
• A distributed messaging system
..that store messages as a log!
22
Store Messages as a Log
4 5 5 7 8 9 10 11 12...
Producer Write
Consumer1 Reads
(offset 7)
Consumer2 Reads
(offset 10)
Messages
3
23
Partition the Log across Machines
Topic 1
Topic 2
Partitions
Producers
Producers
Consumers
Consumers
Brokers
24
ACK mode Latency On Failures
“no" no network delay some data loss
“leader" 1 network roundtrip a few data loss
“all" ~2 network roundtrips no data loss
Configurable ISR Commits
25
• Why across Data Centers?
• Design patterns for Multi-DC
• Kafka for Multi-DC
• Conclusion
Agenda
26
Option I: Active-Passive Replication
Kafka
local
producers
consumer consumer
DC 1
MirrorMaker
DC 2
Kafka
replica
27
Option I: Active-Passive Replication
• Async- replication across DC
• May lose data on failover
• Example: ETL to data warehouse / HDFS
Kafka
local
producers
consumer consumer
DC 1
MirrorMaker
DC 2
Kafka
replica
28
Option II: Active-Active Replication
Kafka
local
Kafka
aggregate
Kafka
aggregate
producers producers
consumer consumer
MirrorMaker
Kafka
local
on DC1 failure
DC 1 DC 2
29
Option II: Active-Active Replication
• Global view on agg. cluster
• Require offsets to resume
• Example: store materialization, index updates
Kafka
local
Kafka
agg
Kafka
agg
producers producers
consumer consumer
MirrorMaker
Kafka
local
on DC1 failure
DC 1 DC 2
30
• Offsets not identical between Kafka clusters
• Duplicates during failover
• Partition selection may be different
• Solutions
• Resume from log end offset (suitable for real-time apps)
• Resume from a timestamp (ListOffsets, offset index: KIP-33)
Caveats: offsets across DCs
31
Option III: Deploy across DCs
Kafka
producers producers
consumer consumer
DC 1 DC 2
32
Option III: Deploy across DCs
• Multi-tenancy support
• Security (0.9)
• Quota Management (0.9)
• Latency optimization
• Rack-aware partition assignment (0.10)
• Read affinity (future?)
Kafka
producers producers
consumer consumer
DC 1 DC 2
33
• Same region: essentially same network
• asymmetric partitioning is rare, low latency
• Need at least 3 DCs for Zookeeper
• Reserved instance to reduce churns
• EIP for external clients, private IPs for internal communication
• Reserved instance, local storage
Example: EC2 multi-AZ Deployment
34
Take-aways
• Multi-DC: trade-off between latency and consistency
• Kafka: replicated log streams for multihoming
Thank you
Guozhang | guozhang@confluent.io | @guozhangwang
Meet Confluent in booth #838 

Confluent University ~ Kafka training ~ confluent.io/training
Join the Stream Data Hackathon Apr 25, SF

kafka-summit.org/hackathon/
Download Apache Kafka
& Confluent Platform
confluent.io/download

Building Stream Infrastructure across Multiple Data Centers with Apache Kafka

  • 1.
    When One DataCenter is not Enough Guozhang Wang Strata San Jose, 2016 Building large-scale stream infrastructure across multiple data centers with Apache Kafka
  • 2.
    2 • Why acrossData Centers? • Design patterns for Multi-DC • Kafka for Multi-DC • Conclusion Agenda
  • 3.
  • 4.
    4 Why across DataCenters • Catastrophic / expected failures • Routine maintenance • Geo-locality (Example: CDNs)
  • 5.
    5 Why NOT acrossData Centers • Low bandwidth (10Mbps - 1Gbps) • High latency (50ms - 450ms) • Much More $$$
  • 6.
    6 Why NOT acrossData Centers • … is hard and expensive
  • 7.
    7 Why NOT acrossData Centers • … is hard and expensive • … with real-time writes? Harder
  • 8.
    8 Why NOT acrossData Centers • … is hard and expensive • … with real-time writes? Harder • … consistently? Oh My!
  • 9.
  • 10.
    10 Weak No Consistency •Now you see my writes, now you don’t • Best effort only, data can be stale • Examples: think of “caches”, VoIP
  • 11.
    11 Eventual Consistency • Youwill see my writes, … eventually • May need to resolve conflicts (manually) • Examples: think of “emails”, SMTP
  • 12.
    12 Strong Consistency • Youget what you write, for sure • External > Sequential > Causal (Session) • Examples: RDBMS, file systems
  • 13.
    13 • LAN: consistencyover latency • WAN: latency over consistency Latency vs. Consistency
  • 14.
    14 • Why acrossData Centers? • Design patterns for Multi-DC • Kafka for Multi-DC • Conclusion Agenda
  • 15.
    15 Option I: Don’tdo it • Bunkerize the single data center • Expect data loss at failures • Examples: ??
  • 16.
    16 Option II: Primarywith Hot Standby • Failover to hot standby (maybe inconsistent) • Window of data loss at failures • Examples: MySQL binlog
  • 17.
    17 Option III: Active-Active •Accepts writes in multi-DC • Resolve conflicts (strong / week consistency) • Examples: Amazon DynamoDB (vector clock) Google Spanner (2PC), Mesa (Paxos)
  • 18.
  • 19.
    19 Ordering is Key •Vector clocks: partial ordering • Paxos, 2PC: global ordering • Log shipping: logical ordering (per-partition)
  • 21.
    21 Apache Kafka • Adistributed messaging system ..that store messages as a log!
  • 22.
    22 Store Messages asa Log 4 5 5 7 8 9 10 11 12... Producer Write Consumer1 Reads (offset 7) Consumer2 Reads (offset 10) Messages 3
  • 23.
    23 Partition the Logacross Machines Topic 1 Topic 2 Partitions Producers Producers Consumers Consumers Brokers
  • 24.
    24 ACK mode LatencyOn Failures “no" no network delay some data loss “leader" 1 network roundtrip a few data loss “all" ~2 network roundtrips no data loss Configurable ISR Commits
  • 25.
    25 • Why acrossData Centers? • Design patterns for Multi-DC • Kafka for Multi-DC • Conclusion Agenda
  • 26.
    26 Option I: Active-PassiveReplication Kafka local producers consumer consumer DC 1 MirrorMaker DC 2 Kafka replica
  • 27.
    27 Option I: Active-PassiveReplication • Async- replication across DC • May lose data on failover • Example: ETL to data warehouse / HDFS Kafka local producers consumer consumer DC 1 MirrorMaker DC 2 Kafka replica
  • 28.
    28 Option II: Active-ActiveReplication Kafka local Kafka aggregate Kafka aggregate producers producers consumer consumer MirrorMaker Kafka local on DC1 failure DC 1 DC 2
  • 29.
    29 Option II: Active-ActiveReplication • Global view on agg. cluster • Require offsets to resume • Example: store materialization, index updates Kafka local Kafka agg Kafka agg producers producers consumer consumer MirrorMaker Kafka local on DC1 failure DC 1 DC 2
  • 30.
    30 • Offsets notidentical between Kafka clusters • Duplicates during failover • Partition selection may be different • Solutions • Resume from log end offset (suitable for real-time apps) • Resume from a timestamp (ListOffsets, offset index: KIP-33) Caveats: offsets across DCs
  • 31.
    31 Option III: Deployacross DCs Kafka producers producers consumer consumer DC 1 DC 2
  • 32.
    32 Option III: Deployacross DCs • Multi-tenancy support • Security (0.9) • Quota Management (0.9) • Latency optimization • Rack-aware partition assignment (0.10) • Read affinity (future?) Kafka producers producers consumer consumer DC 1 DC 2
  • 33.
    33 • Same region:essentially same network • asymmetric partitioning is rare, low latency • Need at least 3 DCs for Zookeeper • Reserved instance to reduce churns • EIP for external clients, private IPs for internal communication • Reserved instance, local storage Example: EC2 multi-AZ Deployment
  • 34.
    34 Take-aways • Multi-DC: trade-offbetween latency and consistency • Kafka: replicated log streams for multihoming
  • 35.
    Thank you Guozhang |guozhang@confluent.io | @guozhangwang Meet Confluent in booth #838 
 Confluent University ~ Kafka training ~ confluent.io/training Join the Stream Data Hackathon Apr 25, SF
 kafka-summit.org/hackathon/ Download Apache Kafka & Confluent Platform confluent.io/download