Apache Bookkeeper and Apache
ZooKeeper for Apache Pulsar
Enrico Olivelli
DataStax - Luna Streaming Team
Member of Apache Pulsar, Apache BookKeeper and Apache ZooKeeper PMC,
Apache Curator VP
Agenda
● Introduction to Apache Pulsar architecture
● Overview about Apache ZooKeeper
● Overview about Apache BookKeeper
● ManagedLedger: Pulsar and BookKeeper
● Handling Failures while guaranteeing Consistency
● Live Demo with BKVM (BookKeeper Visual Manager)
2
Apache Pulsar Architecture
3
A cloud-native, distributed messaging and
streaming platform
Components of a Pulsar Cluster
- Clients
- Brokers
- Bookies
- ZooKeeper cluster
- Proxy (optional)
- Functions Workers (optional)
- Pulsar IO (optional)
- Tiered Storage (optional)
Producer
Proxy
Broker
Bookie
Consumer
Producer
Consumer
Proxy
Broker
Broker
Bookie
Bookie
ZooKeeper ZooKeeper
ZooKeeper
Functions
Functions
Pulsar IO Object
Storage
Apache Pulsar - Core Concepts
4
- Topic:
- Sequence of Messages
- Persistent/Non-Persistent
- Partitioned/Non-Partitioned
- Tenant and Namespace:
- Logical and physical isolation of resources
- Fine grained configuration (topic/namespace/tenant/system levels)
- Subscription:
- A cursor over a topic (tracks status of acknowledgements)
- Modes: Exclusive, Failover, Shared, Key Shared
- Types: Durable, Non-Durable
- Producer:
- Normal, Exclusive
Apache ZooKeeper
5
- Born in Yahoo! and donated to the Apache Software Foundation
- Offers primitives for distributed systems coordination
- Implements a filesystem-like structure
- znodes are like directories and files
- Easy to understand
- No need for shared disks
- Strict ordering of operations
- Leader node + Followers (ZAB protocol)
- Enforced in the client
- Sessions
- Explicit notion of “lost connection”
- Heartbeat based expiration
- Ephemeral nodes
Apache ZooKeeper in Apache Pulsar
6
- Service Discovery
- Leader Election
- Metadata Management
- Configuration Management
- Used by Apache BookKeeper
Broker
Bookie
Broker
Broker
Bookie
Bookie
ZooKeeper ZooKeeper
ZooKeeper
Apache ZooKeeper - Conditional Writes (CW)
7
- Every znode has a version, a (small) content and possibly (few) children
setData(content, expectedVersion)
- Basic building block to ensure consistency
- Only the owner can update the znode
- Version conflict -> fail, assume you are no more the owner
- Successful write -> prevent others to perform the write (version
automatically incremented)
- Only one Broker can make progress at a time while working on metadata
This is not enough to ensure the overall consistency of the system !
Apache BookKeeper
8
- Born in Yahoo! and donated to the Apache Software Foundation
- Subproject of ZooKeeper, then graduated as TLP
- Implements a high performance distributed storage system
- Thick Java Client
- Bookie server: storage only
- Horizontally scalable
- Write/Read paths isolation
- Durability (journal/fsync)
- Replication
- Advanced placement policies
The Broker - the Heart of Pulsar
9
Each Broker is the Owner for a given set of topic bundles:
- Handles reads/writes
- Redirects to other brokers requests for non-owned bundles
- Handles subscriptions, consumers and producers status
- Keeps non-persistent topics data in memory
- Manages Schemas
- Handles cluster wide requests
The Broker uses Apache BookKeeper to store:
- Messages
- Subscriptions (acks)
- Schema
- Code packages (new in 2.8)
The Broker - Data flow when a message is produced
10
The Broker receives a request to publish a message:
● Verify topic ownership
● Verify authorization
● Locates the ManagedLedger instance
● Pass the encoded entry (single message or a batch) to
ManagedLedger
● ManagedLedger passes the entry to the active Ledger WriteHandle
● The BK client sends the entry in parallel to the Bookies
Producer
Broker
Bookie
Bookie
Bookie
ManagedLedger
The Broker - Data flow when a message is produced
11
The Broker receives a request to publish a message:
● Verify topic ownership
● Verify authorization
● Locates the ManagedLedger instance
● Pass the encoded entry (single message or a batch) to
ManagedLedger
● ManagedLedger passes the entry to the active Ledger WriteHandle
● The BK client sends the entry in parallel to the Bookies
● Wait for acknowledgement from the Bookies
● Acknowledge back the write to the Pulsar client
● Now the Message ID is available to the client (LedgerID-EntryID...)
Producer
Broker
Bookie
Bookie
Bookie
ManagedLedger
The Bookie - When the message is persisted
12
The Write path and the Read path are separated inside the Bookie.
Write path:
- The Bookie receives a copy of the entry
- The entry is written to the Journal
- The journal acknowledges the write after a successful fsync
- Entries are grouped in order to reduce the number of fsyncs
- The Bookies acknowledges the operation to the Client
The BookKeeper Client is responsible for:
- Selecting the Bookies (zone/region awareness)
- Waiting for confirmation
- Retransmissions
- Make a checksum of the raw payload
The Broker - ManagedLedger abstraction
13
BookKeeper relies on ZooKeeper CW features to guarantee
consistency of metadata
The Pulsar ManagedLedger is an abstraction over the BookKeeper
Ledger:
- Implements an infinite append-only stream of entries
- Concatenates BK ledgers (metadata only)
- Implements Cursors (support for durable subscriptions)
- Implements Tiered Storage
Ledger 123
Ledger 124
Ledger 137
Ledger 156
Ledger 168
topic
persistent://public/default/test
BookKeeper Ledger:
a write-once, append only, sequence of entries (byte[])
Handling Failures and ensuring Consistency
14
Failures on Broker:
- Network error/partition
- Overwhelmed Broker (Garbage collection, out of memory/CPU)
- Shutdown (or forced Bundle unload)
- ….
A new Broker becomes the Owner for the Topic (ManagedLedger)
- Perform recovery on the current BK ledger
- Create a new Ledger on BK
- Append the new Ledger ID to the list of Ledgers
- Serve write requests (verify that is the owner for each operation!)
More than one broker may start this recovery process !
ZooKeeper CW covers metadata operations,
but it does not help in the hot write path
BookKeeper Fencing and Recovery
15
- The new Broker opens the ledger in Recovery mode
- The BookKeeper Client reads from the Bookies every entry:
- Discover the max valid entry id
- Set the ledger fenced flag on the Bookies (on disks)
- Writes to ZooKeeper the new status of the Ledger
- This may fail during a CW operation !
- Only one broker can perform a successful recovery!
- The old broker:
- Receives a “Ledger Fenced error” on the next write
- Receives a “Bad Version error” while writing to ZooKeeper (if trying
to append a new ledger ID)
- It may receive a Watch Notification from ZooKeeper
At every write BookKeeper ensures the ownership of the Topic
BookKeeper fencing + ZooKeeper CW guarantee consistency of
Pulsar
Live Demo - Inspect a Pulsar Standalone instance
16
- Start Pulsar Standalone
- Use Visual Studio Code to inspect ZooKeeper contents
- Use BKVM to inspect BookKeeper contents
- Write to public/default/test
- Unload the topic
- See that the ManagedLedger created a new Ledger
Wrapping up
17
● ZooKeeper and Bookkeeper came from Yahoo! as well as Pulsar !
● Pulsar ManagedLedger is the high level abstraction over BookKeeper.
● ZooKeeper provides support for Metadata Management, Service Discovery,
Configuration and Leader Election.
● Conditional Writes (CW) guarantee consistency for Metadata operations.
● The Fencing mechanism of BookKeeper ensures Consistency on the write path
● In no case two brokers are able to write concurrently to a Topic, one of them will
eventually fail
References
18
LinkedIn - linkedin.com/in/enrico-olivelli-984b7874/
Twitter: twitter.com/eolivelli
Apache Pulsar Community: pulsar.apache.org/en/contact/ (Slack, ML…)
References:
Apache Pulsar website: pulsar.apache.org - github.com/apache/pulsar
Apache BookKeeper website: bookkeeper.apache.org - github.com/apache/bookkeeper
Apache ZooKeeper website: zookeeper.apache.org - github.com/apache/zookeeper
BKVM website: bkvm.org - github.com/diennea/bookkeeper-visual-manager
Thank you !
19
We are hiring: https://www.datastax.com/company/careers

Apache Bookkeeper and Apache Zookeeper for Apache Pulsar

  • 1.
    Apache Bookkeeper andApache ZooKeeper for Apache Pulsar Enrico Olivelli DataStax - Luna Streaming Team Member of Apache Pulsar, Apache BookKeeper and Apache ZooKeeper PMC, Apache Curator VP
  • 2.
    Agenda ● Introduction toApache Pulsar architecture ● Overview about Apache ZooKeeper ● Overview about Apache BookKeeper ● ManagedLedger: Pulsar and BookKeeper ● Handling Failures while guaranteeing Consistency ● Live Demo with BKVM (BookKeeper Visual Manager) 2
  • 3.
    Apache Pulsar Architecture 3 Acloud-native, distributed messaging and streaming platform Components of a Pulsar Cluster - Clients - Brokers - Bookies - ZooKeeper cluster - Proxy (optional) - Functions Workers (optional) - Pulsar IO (optional) - Tiered Storage (optional) Producer Proxy Broker Bookie Consumer Producer Consumer Proxy Broker Broker Bookie Bookie ZooKeeper ZooKeeper ZooKeeper Functions Functions Pulsar IO Object Storage
  • 4.
    Apache Pulsar -Core Concepts 4 - Topic: - Sequence of Messages - Persistent/Non-Persistent - Partitioned/Non-Partitioned - Tenant and Namespace: - Logical and physical isolation of resources - Fine grained configuration (topic/namespace/tenant/system levels) - Subscription: - A cursor over a topic (tracks status of acknowledgements) - Modes: Exclusive, Failover, Shared, Key Shared - Types: Durable, Non-Durable - Producer: - Normal, Exclusive
  • 5.
    Apache ZooKeeper 5 - Bornin Yahoo! and donated to the Apache Software Foundation - Offers primitives for distributed systems coordination - Implements a filesystem-like structure - znodes are like directories and files - Easy to understand - No need for shared disks - Strict ordering of operations - Leader node + Followers (ZAB protocol) - Enforced in the client - Sessions - Explicit notion of “lost connection” - Heartbeat based expiration - Ephemeral nodes
  • 6.
    Apache ZooKeeper inApache Pulsar 6 - Service Discovery - Leader Election - Metadata Management - Configuration Management - Used by Apache BookKeeper Broker Bookie Broker Broker Bookie Bookie ZooKeeper ZooKeeper ZooKeeper
  • 7.
    Apache ZooKeeper -Conditional Writes (CW) 7 - Every znode has a version, a (small) content and possibly (few) children setData(content, expectedVersion) - Basic building block to ensure consistency - Only the owner can update the znode - Version conflict -> fail, assume you are no more the owner - Successful write -> prevent others to perform the write (version automatically incremented) - Only one Broker can make progress at a time while working on metadata This is not enough to ensure the overall consistency of the system !
  • 8.
    Apache BookKeeper 8 - Bornin Yahoo! and donated to the Apache Software Foundation - Subproject of ZooKeeper, then graduated as TLP - Implements a high performance distributed storage system - Thick Java Client - Bookie server: storage only - Horizontally scalable - Write/Read paths isolation - Durability (journal/fsync) - Replication - Advanced placement policies
  • 9.
    The Broker -the Heart of Pulsar 9 Each Broker is the Owner for a given set of topic bundles: - Handles reads/writes - Redirects to other brokers requests for non-owned bundles - Handles subscriptions, consumers and producers status - Keeps non-persistent topics data in memory - Manages Schemas - Handles cluster wide requests The Broker uses Apache BookKeeper to store: - Messages - Subscriptions (acks) - Schema - Code packages (new in 2.8)
  • 10.
    The Broker -Data flow when a message is produced 10 The Broker receives a request to publish a message: ● Verify topic ownership ● Verify authorization ● Locates the ManagedLedger instance ● Pass the encoded entry (single message or a batch) to ManagedLedger ● ManagedLedger passes the entry to the active Ledger WriteHandle ● The BK client sends the entry in parallel to the Bookies Producer Broker Bookie Bookie Bookie ManagedLedger
  • 11.
    The Broker -Data flow when a message is produced 11 The Broker receives a request to publish a message: ● Verify topic ownership ● Verify authorization ● Locates the ManagedLedger instance ● Pass the encoded entry (single message or a batch) to ManagedLedger ● ManagedLedger passes the entry to the active Ledger WriteHandle ● The BK client sends the entry in parallel to the Bookies ● Wait for acknowledgement from the Bookies ● Acknowledge back the write to the Pulsar client ● Now the Message ID is available to the client (LedgerID-EntryID...) Producer Broker Bookie Bookie Bookie ManagedLedger
  • 12.
    The Bookie -When the message is persisted 12 The Write path and the Read path are separated inside the Bookie. Write path: - The Bookie receives a copy of the entry - The entry is written to the Journal - The journal acknowledges the write after a successful fsync - Entries are grouped in order to reduce the number of fsyncs - The Bookies acknowledges the operation to the Client The BookKeeper Client is responsible for: - Selecting the Bookies (zone/region awareness) - Waiting for confirmation - Retransmissions - Make a checksum of the raw payload
  • 13.
    The Broker -ManagedLedger abstraction 13 BookKeeper relies on ZooKeeper CW features to guarantee consistency of metadata The Pulsar ManagedLedger is an abstraction over the BookKeeper Ledger: - Implements an infinite append-only stream of entries - Concatenates BK ledgers (metadata only) - Implements Cursors (support for durable subscriptions) - Implements Tiered Storage Ledger 123 Ledger 124 Ledger 137 Ledger 156 Ledger 168 topic persistent://public/default/test BookKeeper Ledger: a write-once, append only, sequence of entries (byte[])
  • 14.
    Handling Failures andensuring Consistency 14 Failures on Broker: - Network error/partition - Overwhelmed Broker (Garbage collection, out of memory/CPU) - Shutdown (or forced Bundle unload) - …. A new Broker becomes the Owner for the Topic (ManagedLedger) - Perform recovery on the current BK ledger - Create a new Ledger on BK - Append the new Ledger ID to the list of Ledgers - Serve write requests (verify that is the owner for each operation!) More than one broker may start this recovery process ! ZooKeeper CW covers metadata operations, but it does not help in the hot write path
  • 15.
    BookKeeper Fencing andRecovery 15 - The new Broker opens the ledger in Recovery mode - The BookKeeper Client reads from the Bookies every entry: - Discover the max valid entry id - Set the ledger fenced flag on the Bookies (on disks) - Writes to ZooKeeper the new status of the Ledger - This may fail during a CW operation ! - Only one broker can perform a successful recovery! - The old broker: - Receives a “Ledger Fenced error” on the next write - Receives a “Bad Version error” while writing to ZooKeeper (if trying to append a new ledger ID) - It may receive a Watch Notification from ZooKeeper At every write BookKeeper ensures the ownership of the Topic BookKeeper fencing + ZooKeeper CW guarantee consistency of Pulsar
  • 16.
    Live Demo -Inspect a Pulsar Standalone instance 16 - Start Pulsar Standalone - Use Visual Studio Code to inspect ZooKeeper contents - Use BKVM to inspect BookKeeper contents - Write to public/default/test - Unload the topic - See that the ManagedLedger created a new Ledger
  • 17.
    Wrapping up 17 ● ZooKeeperand Bookkeeper came from Yahoo! as well as Pulsar ! ● Pulsar ManagedLedger is the high level abstraction over BookKeeper. ● ZooKeeper provides support for Metadata Management, Service Discovery, Configuration and Leader Election. ● Conditional Writes (CW) guarantee consistency for Metadata operations. ● The Fencing mechanism of BookKeeper ensures Consistency on the write path ● In no case two brokers are able to write concurrently to a Topic, one of them will eventually fail
  • 18.
    References 18 LinkedIn - linkedin.com/in/enrico-olivelli-984b7874/ Twitter:twitter.com/eolivelli Apache Pulsar Community: pulsar.apache.org/en/contact/ (Slack, ML…) References: Apache Pulsar website: pulsar.apache.org - github.com/apache/pulsar Apache BookKeeper website: bookkeeper.apache.org - github.com/apache/bookkeeper Apache ZooKeeper website: zookeeper.apache.org - github.com/apache/zookeeper BKVM website: bkvm.org - github.com/diennea/bookkeeper-visual-manager
  • 19.
    Thank you ! 19 Weare hiring: https://www.datastax.com/company/careers

Editor's Notes

  • #2 June 15, 2021 Updates: Added Astra DB logo. Replaced Astra Streaming logo with updated version, while adding a horizontal lockup as a secondary option. Updated Luna Streaming logo.