Event Sourcing, Stream Processing & Serverless
Ben Stopford
Office of the CTO, Confluent
What we’re going to talk about
• Event Sourcing
• What it is and how does it relate to Event Streaming?
• Stream Processing as a kind of “Database”
• What does this mean?
• Serverless Functions
• How do this relate?
Can you do event sourcing
with Kafka?
Traditional Event
Sourcing
Popular example: Shopping Cart
DB
Apps
Search
Apps Apps
Database Table matches
what the user sees.
12.42
12.44
12.49
12.50
12.59
Event Sourcing stores events, then derives the
‘current state view’
Apps Apps
DERIVE
Chronological Reduce
Event
Timeseries
of user
activity
Traditional Event Sourcing
(Store immutable events in a database in time order)
Apps
Search
NoSQL
Monitoring
Security
Apps Apps
S T R E A M I N G P L AT F O R MTable of events
Persist events
Apps Apps
Traditional Event Sourcing (Read)
Apps
Search
NoSQL
Monitoring
Security
Apps Apps
S T R E A M I N G P L AT F O R M
Apps
Search Monitoring
Apps Apps
Chronological
Reduce on read
(done inside the app)
Query by
customer Id
(+session?)
- No schema migration
- Similar to ’schema on read’
3 Benefits
Evidentiary
Accountants don’t use erasers
(e.g. audit, ledger, git)
Replayability
Recover corrupted data after a
programmatic bug
Analytics
Keep the data needed to
extract trends and behaviors
i.e. non-lossy
(e.g. insight, metrics, ML)
Traditional Event Sourcing
• Use a database (any one will do)
• Create a table and insert events as they occur
• Query all the events associated with your problem*
• Reduce them chronologically to get the current state
*Aggregate ID in DDD parlance
Traditional Event Sourcing with Kafka
• Use a database Kafka
• Create a table topic insert events as they occur
• Query all the events associated with your problem*
• Reduce them chronologically to get the current state
*Aggregate ID in DDD parlance
Confusion: You can’t query Kafka by say Customer Id*
*Aggregate ID in DDD parlance
If we can’t query by Customer ID
then what do we do?
CQRS is a tonic: Cache the projection in a ‘View’
Apps
Search Monitoring
Apps Apps
S T R E A M I N G P L AT F O R M
Query by customer Id
Apps
Search
NoSQL
Apps Apps
DWH
Hadoop
S T R E A M I N G P L AT F O R M
View
Events/Command
Events are the
Storage Model
Stream Processor
Cache/DB/Ktable etc.
Regenerate the view
rather than doing
schema migration
CQRS provides the
benefits of event
sourcing using a
“Materialized View”
Even with CQRS, Event Sourcing is Hard
CQRS helps, but it’s still quite hard if you’re a CRUD app
What’s the problem?
Harder:
• Eventually Consistent
• Multi-model (Complexity ∝ #Schemas in the log)
• More moving parts
Apps
Search
NoSQL
Monitoring
Security
Apps Apps
S T R E A M I N G P L A T F O R M
CRUD System CQRS
New York Times Website
Source of Truth
Every article since
1851
https://www.confluent.io/blog/publishing-apache-kafka-new-york-times/
Normalized assets
(images, articles, bylines, tags
all separate messages)
Denormalized into
“Content View”
If CRUD makes sense there are other ways:
audit tables, CDC, etc.
Trigger
Evidentiary
Replayable N/A to web app
Analytics
CDC
More advanced: Use a Bi-Temporal Database
Events make most sense
where data has to move
This is where CQRS comes
into its own!
Online Transaction Processing: e.g. a Flight Booking System
- Flight price served 10,000 x #bookings
- Consistency required only at booking time
CQRS with event movement
Apps
Search Monitoring
Apps Apps
S T R E A M I N G P L AT F O R M
Apps
Search
NoSQL
Apps Apps
DWH
Hadoop
S T R E A M I N G P L AT F O R M
View
Book Flight
Apps
Search
Apps
S T R E A M I N G P L A
View
Apps
Search
NoSQL
Apps
DWH
S T R E A M I N G P L A
View
Get Flights
Get Flights
Get Flights
Global Read
Central Write
The exact same logic applies
to microservices
Event Sourcing for Microservices
Basket Service
Fraud Service
Billing Service
Email ServiceBasket Events
Event Sourcing for Microservices
Basket Service
Fraud Service
Billing Service
Email ServiceBasket Events
Events are the
storage model
Each microservice creates a
view that suits its use case
Event Sourcing “with a DB”
for monoliths.
Event Streaming for
Microservices & Scale.
(Often via. CQRS)
Event Streaming
Event Streaming is a more general form of Event Sourcing/CQRS
Event Streaming
• Events as shared data model
• Many microservices
• Polyglot persistence
• Event-Driven processing
Traditional Event Sourcing
• Events as a storage model
• Single microservice
• Single DB
• data-at-rest
Event Streams is about many event sources
(Join, Filter, Transform and Summarize)
Fraud Service
Orders
Service
Payment
Service
Customer
Service
Event Log
Projection created in
Kafka Streams API
KStreams & KSQL have different positioning
•KStreams is a library for Dataflow programming:
• App Logic & Stream Processor (including state) are combined.
• Apps are stateful.
• JVM only.
•KSQL is a ‘database’ for event preparation:
• App sends SQL to a separate process
• Apps are stateless
• Connect from any language
This difference makes most
sense if we we look to the
future.
Cloud & Serverless
Thesis
• Serverless provides event-driven infrastructure
• KSQL is the corollary: an event-driven database
Serverless Functions (FaaS)
• Write a function
• Upload
• Configure a trigger (HTTP, Messaging, Object Store, Database, Timer etc.)
Request Respond Event Source
FaaS in a Nutshell
• Fully managed (Runs in a container pool)
• Pay for execution time (not resources used)
• Auto-scales with load
• 0-1000+ concurrent functions
• Stateless
• Short lived (limit 5-15 mins)
• Weak ordering guarantees
• Cold start’s can be (very) slow: 100ms – 45s (AWS 250ms-7s)
Where is FaaS useful?
• Spikey workloads and ‘occasional’ use cases
• Use cases that don’t typically warrant massive parallelism
e.g. CI systems.
• General purpose programming paradigm?
But there are open questions
Serverless Developer Ecosystem
• Runtime diagnostics
• Monitoring
• Deploy loop
• Testing
• IDE integration
Currently quite poor
Harder than current approaches Easier than current approaches
Amazon
Google
Microsoft
Serverless programming will likely become prevalent
In the future it seems
unlikely we’ll manage our
own infrastructure.
But where will we manage
our data?
Event-Streaming approaches this
from a different angle
FaaS is event-driven
But it isn’t streaming
Complex, Timing issues, Scaling limits
Customers
Event Source
Orders
Event Source
Payments
Event Source
Serverless functions handle only one event source
FaaS/μS
FaaS/μS
FaaS/μS
A slightly more complex
example:
Send email only to
platinum customers
Payments
Event Source
Event is received by serverless function
FaaS/μS
Payments
Event Source
Block and calls the database to get customer+order
FaaS/μS
Get customer
Get order
Payments
Event Source
Is it a ‘Platinum’ customer?
FaaS/μS
Get customer
Get order
Is the customer
platinum?
Payments
Event Source
Send email if ‘Platinum’
FaaS/μS
Get customer
Get order
Maybe send email
Payments
Event Source
Increase Load: 100 concurrant functions doing IO.
FaaS/μS
FaaS/μS
FaaS/μS
FaaS/μS
FaaS/μS
FaaS/μS
FaaS/μS
FaaS/μS
FaaS/μS
FaaS/μS
FaaS/μS
FaaS/μS
FaaS/μS
FaaS/μS
Payments
Event Source
Only send 2 emails.
FaaS/μS
FaaS/μS
FaaS/μS
FaaS/μS
FaaS/μS
FaaS/μS
FaaS/μS
FaaS/μS
FaaS/μS
FaaS/μS
FaaS/μS
FaaS/μS
FaaS/μS
FaaS/μS
Send SQL
Process
boundary
Orders
Payments
KSQL
Customers
Table
Customers
KSQL simplifies:
App
Logic
CREATE STREAM foo AS
SELECT * FROM orders, payments,
customers
LEFT JOIN…
WHERE customer.type = ‘PLATINUM’
Order
Payment
Customer
KSQL
- Handle timing issues
- No “per-event” IO.
- Price efficient
Functions have no
additional data
dependencies:
Everything is in the event!
Queries filter out the
events you need
(much like you filter rows in a
database query)
FaaSFaaSFaaSKSQL
Customers
Table
KSQL as a “Database” for Event-Driven Infrastructure
FaaSFaaS
Stateless,
elastic compute
Prepare the
events we need
(Sateful)
Orders
Payments
Customers
Autoscale
with load
FaaS
Traditional
Application
Event-Driven
Application
Application
Database
KSQL
Stateful
Data Layer
FaaS
FaaS
FaaS
FaaS
FaaS
Streaming
Stateless
Stateless
Stateless
Compute Layer
Massive linear scalability with elasticity
Event-Driven vs. Event Streaming
Event Driven Event Streaming
Multiple Event Sources Use Database + ETL + Code Handles automatically
Efficiency Extract data from DB in the
FaaS (IO)
Only the data you need
Logic-driven data requests. Call DB from the FaaS (IO) DB/KStreams KqlDB?
Event Streaming Platform
Summary
• Event Streaming provides the benefits of Event Sourcing to
microservices and data pipelines.
• Events are the data model.
• Projections are the serving model: matching to each specific use case
• Serving layer can be regenerated from the log (CQRS)
• KSQL provides the same benefits for event-driven programs: e.g.
preparing the event streams each FaaS application’s specific needs
• In serverless architectures this drives efficiency: a ‘database-
equivalent’ for event-driven infrastructure.
FaaSFaaSFaaSKSQL
Can I Build This?
FaaSFaaS
AWS Lambda /
Azure Functions Connectors
(in Preview)
Hosted KSQL In Preview
Confluent Cloud
Things I didn’t tell you
• Tools like KSQL provide data provisioning, not state mutation.
• Use single writers. Try KSQL DB?
• Can KSQL handle large state?
• Unintended rebalance can stall processing
• Static membership (KIP-345) – name the list of stream processors
• Increase the timeout for rebalance after node removal (group.max.session.timeout.ms)
• Worst case reload: RocksDB ~GbE speed
• Can Kafka be used for long term storage?
• Log files are immutable once they roll (unless compacted)
• Jun spent a decade working on DB2
• Careful:
• Historical reads can stall real-time requests (cached)
• ZFS has several page cache optimizations
• Tiered storage will help
Find out More
• Peeking Behind the Curtains of Serverless Platforms, Wang et al.
• Cloud Programming Simplified: A Berkeley View on Serverless Compute
• Neil Avery’s Journey to Event Driven Part 3. The Affinity Between Events, Streams and Serverless.
• Designing Event Driven Systems, Ben Stopford
Thank you
@benstopford
Book:
https://www.confluent.io/designing-event-driven-systems

Event Sourcing, Stream Processing and Serverless (Ben Stopford, Confluent) Kafka Summit SF 2019

  • 1.
    Event Sourcing, StreamProcessing & Serverless Ben Stopford Office of the CTO, Confluent
  • 2.
    What we’re goingto talk about • Event Sourcing • What it is and how does it relate to Event Streaming? • Stream Processing as a kind of “Database” • What does this mean? • Serverless Functions • How do this relate?
  • 3.
    Can you doevent sourcing with Kafka?
  • 4.
  • 5.
    Popular example: ShoppingCart DB Apps Search Apps Apps Database Table matches what the user sees.
  • 6.
    12.42 12.44 12.49 12.50 12.59 Event Sourcing storesevents, then derives the ‘current state view’ Apps Apps DERIVE Chronological Reduce Event Timeseries of user activity
  • 7.
    Traditional Event Sourcing (Storeimmutable events in a database in time order) Apps Search NoSQL Monitoring Security Apps Apps S T R E A M I N G P L AT F O R MTable of events Persist events Apps Apps
  • 8.
    Traditional Event Sourcing(Read) Apps Search NoSQL Monitoring Security Apps Apps S T R E A M I N G P L AT F O R M Apps Search Monitoring Apps Apps Chronological Reduce on read (done inside the app) Query by customer Id (+session?) - No schema migration - Similar to ’schema on read’
  • 9.
  • 10.
    Evidentiary Accountants don’t useerasers (e.g. audit, ledger, git)
  • 11.
    Replayability Recover corrupted dataafter a programmatic bug
  • 12.
    Analytics Keep the dataneeded to extract trends and behaviors i.e. non-lossy (e.g. insight, metrics, ML)
  • 13.
    Traditional Event Sourcing •Use a database (any one will do) • Create a table and insert events as they occur • Query all the events associated with your problem* • Reduce them chronologically to get the current state *Aggregate ID in DDD parlance
  • 14.
    Traditional Event Sourcingwith Kafka • Use a database Kafka • Create a table topic insert events as they occur • Query all the events associated with your problem* • Reduce them chronologically to get the current state *Aggregate ID in DDD parlance
  • 15.
    Confusion: You can’tquery Kafka by say Customer Id* *Aggregate ID in DDD parlance
  • 16.
    If we can’tquery by Customer ID then what do we do?
  • 17.
    CQRS is atonic: Cache the projection in a ‘View’ Apps Search Monitoring Apps Apps S T R E A M I N G P L AT F O R M Query by customer Id Apps Search NoSQL Apps Apps DWH Hadoop S T R E A M I N G P L AT F O R M View Events/Command Events are the Storage Model Stream Processor Cache/DB/Ktable etc. Regenerate the view rather than doing schema migration
  • 18.
    CQRS provides the benefitsof event sourcing using a “Materialized View”
  • 19.
    Even with CQRS,Event Sourcing is Hard CQRS helps, but it’s still quite hard if you’re a CRUD app
  • 20.
    What’s the problem? Harder: •Eventually Consistent • Multi-model (Complexity ∝ #Schemas in the log) • More moving parts Apps Search NoSQL Monitoring Security Apps Apps S T R E A M I N G P L A T F O R M CRUD System CQRS
  • 21.
    New York TimesWebsite Source of Truth Every article since 1851 https://www.confluent.io/blog/publishing-apache-kafka-new-york-times/ Normalized assets (images, articles, bylines, tags all separate messages) Denormalized into “Content View”
  • 22.
    If CRUD makessense there are other ways: audit tables, CDC, etc. Trigger Evidentiary Replayable N/A to web app Analytics CDC
  • 23.
    More advanced: Usea Bi-Temporal Database
  • 24.
    Events make mostsense where data has to move
  • 25.
    This is whereCQRS comes into its own!
  • 26.
    Online Transaction Processing:e.g. a Flight Booking System - Flight price served 10,000 x #bookings - Consistency required only at booking time
  • 27.
    CQRS with eventmovement Apps Search Monitoring Apps Apps S T R E A M I N G P L AT F O R M Apps Search NoSQL Apps Apps DWH Hadoop S T R E A M I N G P L AT F O R M View Book Flight Apps Search Apps S T R E A M I N G P L A View Apps Search NoSQL Apps DWH S T R E A M I N G P L A View Get Flights Get Flights Get Flights Global Read Central Write
  • 28.
    The exact samelogic applies to microservices
  • 29.
    Event Sourcing forMicroservices Basket Service Fraud Service Billing Service Email ServiceBasket Events
  • 30.
    Event Sourcing forMicroservices Basket Service Fraud Service Billing Service Email ServiceBasket Events Events are the storage model Each microservice creates a view that suits its use case
  • 31.
    Event Sourcing “witha DB” for monoliths. Event Streaming for Microservices & Scale. (Often via. CQRS)
  • 32.
  • 33.
    Event Streaming isa more general form of Event Sourcing/CQRS Event Streaming • Events as shared data model • Many microservices • Polyglot persistence • Event-Driven processing Traditional Event Sourcing • Events as a storage model • Single microservice • Single DB • data-at-rest
  • 34.
    Event Streams isabout many event sources (Join, Filter, Transform and Summarize) Fraud Service Orders Service Payment Service Customer Service Event Log Projection created in Kafka Streams API
  • 35.
    KStreams & KSQLhave different positioning •KStreams is a library for Dataflow programming: • App Logic & Stream Processor (including state) are combined. • Apps are stateful. • JVM only. •KSQL is a ‘database’ for event preparation: • App sends SQL to a separate process • Apps are stateless • Connect from any language
  • 36.
    This difference makesmost sense if we we look to the future.
  • 37.
  • 38.
    Thesis • Serverless providesevent-driven infrastructure • KSQL is the corollary: an event-driven database
  • 39.
    Serverless Functions (FaaS) •Write a function • Upload • Configure a trigger (HTTP, Messaging, Object Store, Database, Timer etc.) Request Respond Event Source
  • 40.
    FaaS in aNutshell • Fully managed (Runs in a container pool) • Pay for execution time (not resources used) • Auto-scales with load • 0-1000+ concurrent functions • Stateless • Short lived (limit 5-15 mins) • Weak ordering guarantees • Cold start’s can be (very) slow: 100ms – 45s (AWS 250ms-7s)
  • 41.
    Where is FaaSuseful? • Spikey workloads and ‘occasional’ use cases • Use cases that don’t typically warrant massive parallelism e.g. CI systems. • General purpose programming paradigm?
  • 42.
    But there areopen questions
  • 43.
    Serverless Developer Ecosystem •Runtime diagnostics • Monitoring • Deploy loop • Testing • IDE integration Currently quite poor
  • 44.
    Harder than currentapproaches Easier than current approaches Amazon Google Microsoft Serverless programming will likely become prevalent
  • 45.
    In the futureit seems unlikely we’ll manage our own infrastructure. But where will we manage our data?
  • 47.
  • 48.
    FaaS is event-driven Butit isn’t streaming
  • 49.
    Complex, Timing issues,Scaling limits Customers Event Source Orders Event Source Payments Event Source Serverless functions handle only one event source FaaS/μS FaaS/μS FaaS/μS
  • 50.
    A slightly morecomplex example: Send email only to platinum customers
  • 51.
    Payments Event Source Event isreceived by serverless function FaaS/μS
  • 52.
    Payments Event Source Block andcalls the database to get customer+order FaaS/μS Get customer Get order
  • 53.
    Payments Event Source Is ita ‘Platinum’ customer? FaaS/μS Get customer Get order Is the customer platinum?
  • 54.
    Payments Event Source Send emailif ‘Platinum’ FaaS/μS Get customer Get order Maybe send email
  • 55.
    Payments Event Source Increase Load:100 concurrant functions doing IO. FaaS/μS FaaS/μS FaaS/μS FaaS/μS FaaS/μS FaaS/μS FaaS/μS FaaS/μS FaaS/μS FaaS/μS FaaS/μS FaaS/μS FaaS/μS FaaS/μS
  • 56.
    Payments Event Source Only send2 emails. FaaS/μS FaaS/μS FaaS/μS FaaS/μS FaaS/μS FaaS/μS FaaS/μS FaaS/μS FaaS/μS FaaS/μS FaaS/μS FaaS/μS FaaS/μS FaaS/μS
  • 57.
    Send SQL Process boundary Orders Payments KSQL Customers Table Customers KSQL simplifies: App Logic CREATESTREAM foo AS SELECT * FROM orders, payments, customers LEFT JOIN… WHERE customer.type = ‘PLATINUM’ Order Payment Customer KSQL - Handle timing issues - No “per-event” IO. - Price efficient
  • 58.
    Functions have no additionaldata dependencies: Everything is in the event!
  • 59.
    Queries filter outthe events you need (much like you filter rows in a database query)
  • 60.
    FaaSFaaSFaaSKSQL Customers Table KSQL as a“Database” for Event-Driven Infrastructure FaaSFaaS Stateless, elastic compute Prepare the events we need (Sateful) Orders Payments Customers Autoscale with load
  • 61.
  • 62.
    Event-Driven vs. EventStreaming Event Driven Event Streaming Multiple Event Sources Use Database + ETL + Code Handles automatically Efficiency Extract data from DB in the FaaS (IO) Only the data you need Logic-driven data requests. Call DB from the FaaS (IO) DB/KStreams KqlDB?
  • 64.
  • 65.
    Summary • Event Streamingprovides the benefits of Event Sourcing to microservices and data pipelines. • Events are the data model. • Projections are the serving model: matching to each specific use case • Serving layer can be regenerated from the log (CQRS) • KSQL provides the same benefits for event-driven programs: e.g. preparing the event streams each FaaS application’s specific needs • In serverless architectures this drives efficiency: a ‘database- equivalent’ for event-driven infrastructure.
  • 66.
    FaaSFaaSFaaSKSQL Can I BuildThis? FaaSFaaS AWS Lambda / Azure Functions Connectors (in Preview) Hosted KSQL In Preview Confluent Cloud
  • 67.
    Things I didn’ttell you • Tools like KSQL provide data provisioning, not state mutation. • Use single writers. Try KSQL DB? • Can KSQL handle large state? • Unintended rebalance can stall processing • Static membership (KIP-345) – name the list of stream processors • Increase the timeout for rebalance after node removal (group.max.session.timeout.ms) • Worst case reload: RocksDB ~GbE speed • Can Kafka be used for long term storage? • Log files are immutable once they roll (unless compacted) • Jun spent a decade working on DB2 • Careful: • Historical reads can stall real-time requests (cached) • ZFS has several page cache optimizations • Tiered storage will help
  • 68.
    Find out More •Peeking Behind the Curtains of Serverless Platforms, Wang et al. • Cloud Programming Simplified: A Berkeley View on Serverless Compute • Neil Avery’s Journey to Event Driven Part 3. The Affinity Between Events, Streams and Serverless. • Designing Event Driven Systems, Ben Stopford
  • 69.