Apache Kafka and Analytics
in a Connected IoT World
Kai Waehner
Technology Evangelist
contact@kai-waehner.de
LinkedIn
@KaiWaehner
www.confluent.io
www.kai-waehner.de
with Apache Kafka
Event
Streaming
IoT and Event Streaming – @KaiWaehner - www.kai-waehner.de
5
STREAM
PROCESSING
Create and store
materialized views
Filter
Analyze in-flight
Time
C CC
IoT and Event Streaming – @KaiWaehner - www.kai-waehner.de
TRADITIONAL
DATABASE
EVENT STREAM
PROCESSING
SELECT * FROM
DB_TABLE
CREATE TABLE T
AS SELECT * FROM
EVENT_STREAM
Active Query: Passive Data:
DB Table
Active Data: Passive Query:
Event Stream
IoT and Event Streaming – @KaiWaehner - www.kai-waehner.de
TABLES STREAMS
USER
JAY
SUE
FRED
CREDIT_SCORE
695
430
710V1
V3
V2
PAYMENTS
42
18
65
...
USER
JAY
SUE
FRED
...
IoT and Event Streaming – @KaiWaehner - www.kai-waehner.de
PUSH PULL
APP
Jay’s credit score is 670
Jay’s credit score is 710
Jay’s credit score is 695
What is Jay’s credit score now?
695
APP
IoT and Event Streaming – @KaiWaehner - www.kai-waehner.de
The Log ConnectorsConnectors
Producer Consumer
Streaming Engine
Apache Kafka - The Rise of an Event Streaming Platform
9
=
Messaging
+
Storage
+
Integration
+
Processing
IoT and Event Streaming – @KaiWaehner - www.kai-waehner.de
Apache Kafka at Scale at Tech Giants
> 7 trillion messages / day > 6 Petabytes / day
“You name it”
* Kafka is not just used for big data
** Kafka Is not just used by tech giants
11
IoT and Event Streaming – @KaiWaehner - www.kai-waehner.de
10 Reasons for Event Streaming with Apache Kafka
Real Time
Scalable
Cost Reduction
24/7 – Zero downtime, zero data loss
Decoupling – Storage, Domain-driven Design
Data (re-)processing and stateful client applications
Integration – Connectivity to IoT, legacy, big data, everything
Hybrid Architecture – On Premises, multi cloud, edge computing
Fully managed cloud
No vendor locking
12
IoT and Event Streaming – @KaiWaehner - www.kai-waehner.de
Device management
Unreliable networks
Connectivity beyond standards
Lightweight edge hardware
…
is not an IoT Platform!
Consumer IoT and Industrial IoT (IIoT)
Use Cases
IoT and Event Streaming – @KaiWaehner - www.kai-waehner.de
Ride-Sharing
More than just Messaging! Data correlation in real-time
for map-matching, ETA, cost calculation, and much more…
https://eng.lyft.com/a-new-real-time-map-matching-algorithm-at-lyft-da593ab7b006
IoT and Event Streaming – @KaiWaehner - www.kai-waehner.de
Connected Car Infrastructure
18
https://www.youtube.com/watch?v=yGLKi3TMJv8
• Real Time Data Analysis
• Swarm Intelligence
• Collaboration with Partners
• Predictive AI
• …
IoT and Event Streaming – @KaiWaehner - www.kai-waehner.de
Tesla
Trillions of messages per day for IoT use cases
https://www.confluent.io/kafka-summit-san-francisco-2019/0-60-teslas-streaming-data-platform/
https://www.confluent.io/blog/stream-processing-iot-data-best-practices-and-techniques/
IoT and Event Streaming – @KaiWaehner - www.kai-waehner.de
Track, manage, and locate
tools and other equipment
anytime and anywhere from
the warehouse to the jobsite https://www.confluent.io/customers/bosch/
https://events.confluent.io/online-talks/bosch-power-toolse-nables-real-time-analytics-on-iot-event-streams
DB Musterfirma | Vorname Name | Abteilung | Datum ("Einfügen > Kopf- und Fußzeile")
22Deutsche Bahn AG | Reisendeninformation
Consistent
real-time information
for travellers
across Germany
RI-Plattform
DB Musterfirma | Vorname Name | Abteilung | Datum ("Einfügen > Kopf- und Fußzeile")
23
Customer timetable
Operational
timetable
Assignments
Railway station
knowledge
Dispositions
Train positions
Matching
Aggregation
Consolidation
Apache
Kafka
Analysis
Railway station
Trains
Mobile Apps
Employees
Deutsche Bahn AG | Reisendeninformation
RI-Plattform
IoT and Event Streaming – @KaiWaehner - www.kai-waehner.de
IoT and Event Streaming – @KaiWaehner - www.kai-waehner.de
IoT and Event Streaming – @KaiWaehner - www.kai-waehner.de
Food Value Chain
IoT-Based and Data-Driven
Single source of truth
across the food value chain
(in the factories, and across regions)
Business critical
operations
(tracking, calculations, alerts, …)
https://www.confluent.io/blog/creating-iot-based-data-driven-food-value-chain-with-confluent-cloud/
IoT and Event Streaming – @KaiWaehner - www.kai-waehner.de
Postmodern ERP (coined by Gartner)
Replace legacy, monolithic and highly customized ERP suites
by a mixture of loosely coupled, exchangeable cloud-based and on-premises applications.
TMS
Legacy Proprietary
SOAP Web Services
Supplier
Alert
ForecastInventory Customer
Order
Core ERP
CRM
SaaS
Kafka Interface
MES
Proprietary
HTTP Web Services
LMS
Legacy Homegrown
Database + CDC
SRM
Kafka-native
IoT and Event Streaming – @KaiWaehner - www.kai-waehner.de
Real Time Supply Chain and Retailing IoT Platform
@ Mojix
https://www.confluent.io/customers/mojix/
Real-time operational intelligence with complex
event processing
Inventory accuracy increased from 65% to 99%
Omnichannel sales
Built using Confluent Cloud, Kafka, Kafka Connect
and Kafka Streams
Hybrid cloud across the edge – at retail stores and
distribution centers – and the cloud
Variety of sources, including RFID readers, camera
sensors, beacons, mobile devices and routers
IoT and Event Streaming – @KaiWaehner - www.kai-waehner.de
Cross-Company Supply Chain Integration
Streaming Replication and API Management
MirrorMaker 2
Confluent Replicator
Cluster Linking
Tier 2
Supplier
OEM Streaming integration
between companies
API Management
(REST et al) is not
appropriate for
streaming data
Infosec and politics are
your biggest hurdle
Tier 1
Supplier
IoT and Event Streaming – @KaiWaehner - www.kai-waehner.de
Augmented Reality for Smart Assistence
with Apache Kafka, Kafka Connect and ksqlDB
Pre-Processing and Data Correlation
(Kafka Streams / ksqlDB)
Receive
Command
Operator
(REST Proxy)
MES
(Java)
Send
Live
Metrics
Send
Command
Send
Production
StatusRobots
(C++)
Receive
Correlated
Information
IoT and Event Streaming – @KaiWaehner - www.kai-waehner.de
Cybersecurity
The threat is real!
Challenges
Stealing IP
DDoS
Ransomware / wiperware
WannaCry, NotPetya, …
Damage: Billions of dollars
”Supply chain attack”
Industry 4.0
Networking
Communication
Connectivity
Open standards
”Always-on”
IoT and Event Streaming – @KaiWaehner - www.kai-waehner.de
Legacy SIEM needs to evolve
ForwarderNetwork traffic
Firewall logs
RDBMS
Application logs
Adaptors
Beats
Sensor Data
Challenges:
● Proprietary forwarders that can only
send data to single source
● Data locked from being shared
● Difficult to scale with growing data
volumes
● Prohibitively high indexing costs
● Unable to filter out noisy data
● Slow batch processing
HTTP proxy logs
IoT and Event Streaming – @KaiWaehner - www.kai-waehner.de
AI/ML
Modernized security information and event management (SIEM)
Filter,
transform,
aggregate
APP SIEM Index
Search
Curated streams
Forensic
Archive
HDFS
S3
Big Query
Syslog
CDC
Network traffic
Firewall logs
RDBMS
Application logs
Sensor Data
HTTP proxy logs
QRadar
Arcsight
Splunk
Elastic
IoT and Event Streaming – @KaiWaehner - www.kai-waehner.de
BMW Group
Industry-ready NLP Service Framework Based on Kafka
https://www.confluent.io/kafka-summit-lon19/industry-ready-nlp-service-framework-kafka/
IoT and Event Streaming – @KaiWaehner - www.kai-waehner.de
Direct streaming ingestion
for model training
with TensorFlow I/O + Kafka Plugin
(no additional data storage
like S3 or HDFS required!)
Time
Model BModel A
Producer
Distributed
Commit Log
Streaming Ingestion and Model Training
with Kafka, Tiered Storage and TensorFlow IO
https://github.com/tensorflow/io
36
Model X
(at a later time)
IoT and Event Streaming – @KaiWaehner - www.kai-waehner.de
Confluent Tiered Storage for Kafka
Object Store
Processing Storage
Transactions,
auth, quota
enforcement,
compaction, ...
Local
Remote
Kafka
Apps
Store Forever
Older data is offloaded to inexpensive object
storage, permitting it to be consumed at any time.
Save $$$
Storage limitations, like capacity and duration,
are effectively uncapped.
Instantaneously scale up and down
Your Kafka clusters will be able to automatically
self-balance load and hence elastically scale
(Only available in Confluent Platform)
IoT and Event Streaming – @KaiWaehner - www.kai-waehner.de
BI
Tool
AI/ML
Machine Vision for Quality Assurance and Yield Management
Apache Kafka and Applied Machine Learning
Filter, transform
aggregate, orchestrate
APP
Real-time alerting
Sensor Data
SCADA
MES
PLCs
OT
Team
Plant
Manager
Images
from Products
of Assembly Lines
IT
Team
Live
Ops
Machine Vision for
Quality Inspection
Reporting
Backup
Data Science Team
Data Lake
Why Confluent
IoT and Event Streaming – @KaiWaehner - www.kai-waehner.de
The Rise of Event Streaming
2010
Apache Kafka
created at LinkedIn by
Confluent founders
2014
2020
80%
Fortune 100
Companies
trust and use
Apache Kafka
IoT and Event Streaming – @KaiWaehner - www.kai-waehner.de
50
I N V E S T M E N T & T I M E
VALUE
3
4
5
1
2
Event Streaming Maturity Model
50
Initial Awareness /
Pilot (1 Kafka Cluster)
Start to Build Pipeline /
Deliver 1 New Outcome
(1 Kafka Cluster)
Mission-Critical
Deployment
(Stretched, Hybrid,
Multi-Region)
Build Contextual Event-
Driven Apps
(Stretched, Hybrid,
Multi-Region)
Central Nervous System
(Global Kafka)
Product, Support, Training, Partners, Technical Account Management...
IoT and Event Streaming – @KaiWaehner - www.kai-waehner.de
51Confluent Platform
Fully Managed Cloud ServiceSelf Managed Software FREEDOM OF CHOICE
COMMITTER-DRIVEN EXPERTISE PartnersTrainingProfessional
Services
Enterprise
Support
Apache Kafka
EFFICIENT
OPERATIONS AT SCALE
PRODUCTION-
STAGE PREREQUISITES
UNRESTRICTED
DEVELOPER PRODUCTIVITY
SQL-based Stream Processing
KSQL (ksqlDB)
Rich Pre-built Ecosystem
Connectors | Hub | Schema Registry
Multi-language Development
non-Java clients | REST Proxy
GUI-driven Mgmt & Monitoring
Control Center
Flexible DevOps Automation
Operator | Ansible
Dynamic Performance &
Elasticity
Auto Data Balancer | Tiered Storage
Enterprise-grade Security
RBAC | Secrets | Audit logs
Data Compatibility
Schema Registry | Schema
Validation
Global Resilience
Multi-Region Clusters | Replicator
Developer Operator Architect
Open Source | Community licensed
PARTNERSHIP
FOR BUSINESS SUCCESS
Complete Engagement Model
Revenue / Cost / Risk Impact
TCO / ROI
Executive Buyer
IoT and Event Streaming – @KaiWaehner - www.kai-waehner.de
Global Event Streaming
Aggregate Small Footprint
Edge Deployments with
Replication (Aggregation)
Simplify Disaster Recovery
Operations with
Multi-Region Clusters
with RPO=0 and RTO=0
Stream Data Globally with
Replication and Cluster Linking
Kai Waehner
Technology Evangelist
contact@kai-waehner.de
@KaiWaehner
www.kai-waehner.de
www.confluent.io
LinkedIn
Questions? Feedback?
Let’s connect!

Apache Kafka® and Analytics in a Connected IoT World

  • 1.
    Apache Kafka andAnalytics in a Connected IoT World Kai Waehner Technology Evangelist contact@kai-waehner.de LinkedIn @KaiWaehner www.confluent.io www.kai-waehner.de
  • 2.
  • 3.
    IoT and EventStreaming – @KaiWaehner - www.kai-waehner.de 5 STREAM PROCESSING Create and store materialized views Filter Analyze in-flight Time C CC
  • 4.
    IoT and EventStreaming – @KaiWaehner - www.kai-waehner.de TRADITIONAL DATABASE EVENT STREAM PROCESSING SELECT * FROM DB_TABLE CREATE TABLE T AS SELECT * FROM EVENT_STREAM Active Query: Passive Data: DB Table Active Data: Passive Query: Event Stream
  • 5.
    IoT and EventStreaming – @KaiWaehner - www.kai-waehner.de TABLES STREAMS USER JAY SUE FRED CREDIT_SCORE 695 430 710V1 V3 V2 PAYMENTS 42 18 65 ... USER JAY SUE FRED ...
  • 6.
    IoT and EventStreaming – @KaiWaehner - www.kai-waehner.de PUSH PULL APP Jay’s credit score is 670 Jay’s credit score is 710 Jay’s credit score is 695 What is Jay’s credit score now? 695 APP
  • 7.
    IoT and EventStreaming – @KaiWaehner - www.kai-waehner.de The Log ConnectorsConnectors Producer Consumer Streaming Engine Apache Kafka - The Rise of an Event Streaming Platform 9 = Messaging + Storage + Integration + Processing
  • 8.
    IoT and EventStreaming – @KaiWaehner - www.kai-waehner.de Apache Kafka at Scale at Tech Giants > 7 trillion messages / day > 6 Petabytes / day “You name it” * Kafka is not just used for big data ** Kafka Is not just used by tech giants 11
  • 9.
    IoT and EventStreaming – @KaiWaehner - www.kai-waehner.de 10 Reasons for Event Streaming with Apache Kafka Real Time Scalable Cost Reduction 24/7 – Zero downtime, zero data loss Decoupling – Storage, Domain-driven Design Data (re-)processing and stateful client applications Integration – Connectivity to IoT, legacy, big data, everything Hybrid Architecture – On Premises, multi cloud, edge computing Fully managed cloud No vendor locking 12
  • 10.
    IoT and EventStreaming – @KaiWaehner - www.kai-waehner.de Device management Unreliable networks Connectivity beyond standards Lightweight edge hardware … is not an IoT Platform!
  • 11.
    Consumer IoT andIndustrial IoT (IIoT) Use Cases
  • 12.
    IoT and EventStreaming – @KaiWaehner - www.kai-waehner.de Ride-Sharing More than just Messaging! Data correlation in real-time for map-matching, ETA, cost calculation, and much more… https://eng.lyft.com/a-new-real-time-map-matching-algorithm-at-lyft-da593ab7b006
  • 13.
    IoT and EventStreaming – @KaiWaehner - www.kai-waehner.de Connected Car Infrastructure 18 https://www.youtube.com/watch?v=yGLKi3TMJv8 • Real Time Data Analysis • Swarm Intelligence • Collaboration with Partners • Predictive AI • …
  • 14.
    IoT and EventStreaming – @KaiWaehner - www.kai-waehner.de Tesla Trillions of messages per day for IoT use cases https://www.confluent.io/kafka-summit-san-francisco-2019/0-60-teslas-streaming-data-platform/ https://www.confluent.io/blog/stream-processing-iot-data-best-practices-and-techniques/
  • 15.
    IoT and EventStreaming – @KaiWaehner - www.kai-waehner.de Track, manage, and locate tools and other equipment anytime and anywhere from the warehouse to the jobsite https://www.confluent.io/customers/bosch/ https://events.confluent.io/online-talks/bosch-power-toolse-nables-real-time-analytics-on-iot-event-streams
  • 16.
    DB Musterfirma |Vorname Name | Abteilung | Datum ("Einfügen > Kopf- und Fußzeile") 22Deutsche Bahn AG | Reisendeninformation Consistent real-time information for travellers across Germany RI-Plattform
  • 17.
    DB Musterfirma |Vorname Name | Abteilung | Datum ("Einfügen > Kopf- und Fußzeile") 23 Customer timetable Operational timetable Assignments Railway station knowledge Dispositions Train positions Matching Aggregation Consolidation Apache Kafka Analysis Railway station Trains Mobile Apps Employees Deutsche Bahn AG | Reisendeninformation RI-Plattform
  • 18.
    IoT and EventStreaming – @KaiWaehner - www.kai-waehner.de
  • 19.
    IoT and EventStreaming – @KaiWaehner - www.kai-waehner.de
  • 20.
    IoT and EventStreaming – @KaiWaehner - www.kai-waehner.de Food Value Chain IoT-Based and Data-Driven Single source of truth across the food value chain (in the factories, and across regions) Business critical operations (tracking, calculations, alerts, …) https://www.confluent.io/blog/creating-iot-based-data-driven-food-value-chain-with-confluent-cloud/
  • 21.
    IoT and EventStreaming – @KaiWaehner - www.kai-waehner.de Postmodern ERP (coined by Gartner) Replace legacy, monolithic and highly customized ERP suites by a mixture of loosely coupled, exchangeable cloud-based and on-premises applications. TMS Legacy Proprietary SOAP Web Services Supplier Alert ForecastInventory Customer Order Core ERP CRM SaaS Kafka Interface MES Proprietary HTTP Web Services LMS Legacy Homegrown Database + CDC SRM Kafka-native
  • 22.
    IoT and EventStreaming – @KaiWaehner - www.kai-waehner.de Real Time Supply Chain and Retailing IoT Platform @ Mojix https://www.confluent.io/customers/mojix/ Real-time operational intelligence with complex event processing Inventory accuracy increased from 65% to 99% Omnichannel sales Built using Confluent Cloud, Kafka, Kafka Connect and Kafka Streams Hybrid cloud across the edge – at retail stores and distribution centers – and the cloud Variety of sources, including RFID readers, camera sensors, beacons, mobile devices and routers
  • 23.
    IoT and EventStreaming – @KaiWaehner - www.kai-waehner.de Cross-Company Supply Chain Integration Streaming Replication and API Management MirrorMaker 2 Confluent Replicator Cluster Linking Tier 2 Supplier OEM Streaming integration between companies API Management (REST et al) is not appropriate for streaming data Infosec and politics are your biggest hurdle Tier 1 Supplier
  • 24.
    IoT and EventStreaming – @KaiWaehner - www.kai-waehner.de Augmented Reality for Smart Assistence with Apache Kafka, Kafka Connect and ksqlDB Pre-Processing and Data Correlation (Kafka Streams / ksqlDB) Receive Command Operator (REST Proxy) MES (Java) Send Live Metrics Send Command Send Production StatusRobots (C++) Receive Correlated Information
  • 25.
    IoT and EventStreaming – @KaiWaehner - www.kai-waehner.de Cybersecurity The threat is real! Challenges Stealing IP DDoS Ransomware / wiperware WannaCry, NotPetya, … Damage: Billions of dollars ”Supply chain attack” Industry 4.0 Networking Communication Connectivity Open standards ”Always-on”
  • 26.
    IoT and EventStreaming – @KaiWaehner - www.kai-waehner.de Legacy SIEM needs to evolve ForwarderNetwork traffic Firewall logs RDBMS Application logs Adaptors Beats Sensor Data Challenges: ● Proprietary forwarders that can only send data to single source ● Data locked from being shared ● Difficult to scale with growing data volumes ● Prohibitively high indexing costs ● Unable to filter out noisy data ● Slow batch processing HTTP proxy logs
  • 27.
    IoT and EventStreaming – @KaiWaehner - www.kai-waehner.de AI/ML Modernized security information and event management (SIEM) Filter, transform, aggregate APP SIEM Index Search Curated streams Forensic Archive HDFS S3 Big Query Syslog CDC Network traffic Firewall logs RDBMS Application logs Sensor Data HTTP proxy logs QRadar Arcsight Splunk Elastic
  • 28.
    IoT and EventStreaming – @KaiWaehner - www.kai-waehner.de BMW Group Industry-ready NLP Service Framework Based on Kafka https://www.confluent.io/kafka-summit-lon19/industry-ready-nlp-service-framework-kafka/
  • 29.
    IoT and EventStreaming – @KaiWaehner - www.kai-waehner.de Direct streaming ingestion for model training with TensorFlow I/O + Kafka Plugin (no additional data storage like S3 or HDFS required!) Time Model BModel A Producer Distributed Commit Log Streaming Ingestion and Model Training with Kafka, Tiered Storage and TensorFlow IO https://github.com/tensorflow/io 36 Model X (at a later time)
  • 30.
    IoT and EventStreaming – @KaiWaehner - www.kai-waehner.de Confluent Tiered Storage for Kafka Object Store Processing Storage Transactions, auth, quota enforcement, compaction, ... Local Remote Kafka Apps Store Forever Older data is offloaded to inexpensive object storage, permitting it to be consumed at any time. Save $$$ Storage limitations, like capacity and duration, are effectively uncapped. Instantaneously scale up and down Your Kafka clusters will be able to automatically self-balance load and hence elastically scale (Only available in Confluent Platform)
  • 31.
    IoT and EventStreaming – @KaiWaehner - www.kai-waehner.de BI Tool AI/ML Machine Vision for Quality Assurance and Yield Management Apache Kafka and Applied Machine Learning Filter, transform aggregate, orchestrate APP Real-time alerting Sensor Data SCADA MES PLCs OT Team Plant Manager Images from Products of Assembly Lines IT Team Live Ops Machine Vision for Quality Inspection Reporting Backup Data Science Team Data Lake
  • 32.
  • 33.
    IoT and EventStreaming – @KaiWaehner - www.kai-waehner.de The Rise of Event Streaming 2010 Apache Kafka created at LinkedIn by Confluent founders 2014 2020 80% Fortune 100 Companies trust and use Apache Kafka
  • 34.
    IoT and EventStreaming – @KaiWaehner - www.kai-waehner.de 50 I N V E S T M E N T & T I M E VALUE 3 4 5 1 2 Event Streaming Maturity Model 50 Initial Awareness / Pilot (1 Kafka Cluster) Start to Build Pipeline / Deliver 1 New Outcome (1 Kafka Cluster) Mission-Critical Deployment (Stretched, Hybrid, Multi-Region) Build Contextual Event- Driven Apps (Stretched, Hybrid, Multi-Region) Central Nervous System (Global Kafka) Product, Support, Training, Partners, Technical Account Management...
  • 35.
    IoT and EventStreaming – @KaiWaehner - www.kai-waehner.de 51Confluent Platform Fully Managed Cloud ServiceSelf Managed Software FREEDOM OF CHOICE COMMITTER-DRIVEN EXPERTISE PartnersTrainingProfessional Services Enterprise Support Apache Kafka EFFICIENT OPERATIONS AT SCALE PRODUCTION- STAGE PREREQUISITES UNRESTRICTED DEVELOPER PRODUCTIVITY SQL-based Stream Processing KSQL (ksqlDB) Rich Pre-built Ecosystem Connectors | Hub | Schema Registry Multi-language Development non-Java clients | REST Proxy GUI-driven Mgmt & Monitoring Control Center Flexible DevOps Automation Operator | Ansible Dynamic Performance & Elasticity Auto Data Balancer | Tiered Storage Enterprise-grade Security RBAC | Secrets | Audit logs Data Compatibility Schema Registry | Schema Validation Global Resilience Multi-Region Clusters | Replicator Developer Operator Architect Open Source | Community licensed PARTNERSHIP FOR BUSINESS SUCCESS Complete Engagement Model Revenue / Cost / Risk Impact TCO / ROI Executive Buyer
  • 36.
    IoT and EventStreaming – @KaiWaehner - www.kai-waehner.de Global Event Streaming Aggregate Small Footprint Edge Deployments with Replication (Aggregation) Simplify Disaster Recovery Operations with Multi-Region Clusters with RPO=0 and RTO=0 Stream Data Globally with Replication and Cluster Linking
  • 37.