Building Stream Infrastructure across Multiple Data Centers with Apache Kafka
The document discusses building large-scale streaming infrastructure across multiple data centers using Apache Kafka. It outlines the reasons for multi-data center architecture, design patterns, and the trade-offs between latency and consistency. Various options for data replication and management are presented, including active-active and active-passive replication strategies.
This section discusses the necessity of multi-data center setups due to failures and geo-locality, along with challenges like bandwidth, latency, and consistency.
Explores different consistency models: Weak, Eventual, and Strong consistency with examples, stressing their implications for data integrity in distributed systems.
Examines the trade-offs between consistency and latency in WAN vs. LAN environments, setting the stage for the architecture design topics ahead.
Describes various strategies for data center deployment including Bunkerizing, Primary with Hot Standby, and Active-Active configurations to address failures.
Emphasizes the crucial role of ordering in distributed systems and introduces methods like vector clocks and Paxos for effective ordering.
Introduces Apache Kafka as a distributed messaging system that operates on a log-based message storage mechanism for efficient data handling.
Discusses how Kafka partitions logs across various machines, enhancing performance with configurable acknowledgment modes to handle failures.
Details the Active-Passive replication method in Kafka, its use for asynchronous data replication, and trade-offs including potential data loss.
Explains Active-Active replication in Kafka with a focus on global view maintenance and challenges during failovers with examples.
Addresses challenges of deploying Kafka across data centers, including differences in offsets, potential duplicates, and solutions for real-time applications.
Illustrates strategies for deploying Kafka in multiple data centers, focusing on multi-tenancy, security, and optimized latency.
Provides an example of EC2 multi-AZ deployment addressing considerations like latency, Zookeeper needs, and maintaining data integrity.
Wraps up the discussion with key takeaways about multi-DC trade-offs, inviting attendees to engage with Confluent for further learning.
Building Stream Infrastructure across Multiple Data Centers with Apache Kafka
1.
When One DataCenter is not Enough
Guozhang Wang Strata San Jose, 2016
Building large-scale stream infrastructure across multiple data centers with Apache Kafka
2.
2
• Why acrossData Centers?
• Design patterns for Multi-DC
• Kafka for Multi-DC
• Conclusion
Agenda
24
ACK mode LatencyOn Failures
“no" no network delay some data loss
“leader" 1 network roundtrip a few data loss
“all" ~2 network roundtrips no data loss
Configurable ISR Commits
25.
25
• Why acrossData Centers?
• Design patterns for Multi-DC
• Kafka for Multi-DC
• Conclusion
Agenda
26.
26
Option I: Active-PassiveReplication
Kafka
local
producers
consumer consumer
DC 1
MirrorMaker
DC 2
Kafka
replica
27.
27
Option I: Active-PassiveReplication
• Async- replication across DC
• May lose data on failover
• Example: ETL to data warehouse / HDFS
Kafka
local
producers
consumer consumer
DC 1
MirrorMaker
DC 2
Kafka
replica
28.
28
Option II: Active-ActiveReplication
Kafka
local
Kafka
aggregate
Kafka
aggregate
producers producers
consumer consumer
MirrorMaker
Kafka
local
on DC1 failure
DC 1 DC 2
29.
29
Option II: Active-ActiveReplication
• Global view on agg. cluster
• Require offsets to resume
• Example: store materialization, index updates
Kafka
local
Kafka
agg
Kafka
agg
producers producers
consumer consumer
MirrorMaker
Kafka
local
on DC1 failure
DC 1 DC 2
30.
30
• Offsets notidentical between Kafka clusters
• Duplicates during failover
• Partition selection may be different
• Solutions
• Resume from log end offset (suitable for real-time apps)
• Resume from a timestamp (ListOffsets, offset index: KIP-33)
Caveats: offsets across DCs
31.
31
Option III: Deployacross DCs
Kafka
producers producers
consumer consumer
DC 1 DC 2
33
• Same region:essentially same network
• asymmetric partitioning is rare, low latency
• Need at least 3 DCs for Zookeeper
• Reserved instance to reduce churns
• EIP for external clients, private IPs for internal communication
• Reserved instance, local storage
Example: EC2 multi-AZ Deployment
Thank you
Guozhang |guozhang@confluent.io | @guozhangwang
Meet Confluent in booth #838
Confluent University ~ Kafka training ~ confluent.io/training
Join the Stream Data Hackathon Apr 25, SF
kafka-summit.org/hackathon/
Download Apache Kafka
& Confluent Platform
confluent.io/download