Utilizing Apache Pulsar, Apache
NiFi and MiNiFi for EdgeAI IoT
at Scale
Tim Spann | Developer Advocate
streamnative.io
Tim Spann
Developer Advocate
● https://www.datainmotion.dev/
● https://github.com/tspannhw/SpeakerProfile
● https://dev.to/tspannhw
● https://sessionize.com/tspann/
DZone Zone Leader and Big Data
MVB Data DJay
streamnative.io
Founded by the original developers of
Apache Pulsar and Apache BookKeeper,
StreamNative builds a cloud-native event
streaming platform that enables
enterprises to easily access data as
real-time event streams.
Apache Pulsar
streamnative.io
Apache is an open source, cloud-native
distributed messaging and streaming platform.
streamnative.io
What are the Benefits of Pulsar?
Data Durability
Scalability Geo-Replication
Multi-Tenancy
Unified Messaging
Model
Apache Pulsar
streamnative.io
A Unified Messaging Platform
Message Queuing
Data Streaming
streamnative.io
Apache Pulsar Overview
Enable Geo-Replicated Messaging
● Pub-Sub
● Geo-Replication
● Pulsar Functions
● Horizontal Scalability
● Multi-tenancy
● Tiered Persistent Storage
● Pulsar Connectors
● REST API
● CLI
● Many clients available
● Four Different Subscription Types
● Multi-Protocol Support
○ MQTT
○ AMQP
○ JMS
○ Kafka
○ ...
streamnative.io
What is the Pulsar Ecosystem?
● Functions and Connectors
○ Functions: Lightweight stream processing
○ Connectors: Part of “Pulsar IO”, includes “Source” and “Sink”
APIs
■ Files, Databases, Data tools, Cloud Services, etc
● Protocol Handlers
○ Allows Pulsar to handle additional protocols by an extendable
API running in the broker
■ AoP (AMQP), KoP (Kafka), MoP (MQTT)
streamnative.io
What is the Pulsar Ecosystem? (cont’d)
● Processing Engines
○ Supports modern processing engines
■ Flink and Spark, as well as Pulsar SQL (Presto/Trino)
● Offloaders
○ Allows data to be offloaded to cloud storage and used with
existing Pulsar APIs
■ S3, GCP Cloud Storage, HDFS, File (NFS), Azure Blob Storage
(in Pulsar 2.7.0)
streamnative.io
Pulsar Functions
Provides a simple API to:
● Receive a message (consume)
● Process the message using your own code
● Send a message (produce)
Takes care of the boilerplate code so there is no need to create
producers and consumers.
streamnative.io
Moving Data In and Out of Pulsar
IO/Connectors are a simple way to integrate with external systems and move data
in and out of Pulsar.
● Built on top of Pulsar Functions
● Built-in connectors - hub.streamnative.io
Source Sink
streamnative.io
MQTT on Pulsar (MoP)
streamnative.io
Pulsar SQL
Presto/Trino workers
can read segments
directly from bookies
(or offloaded storage)
in parallel.
Bookie
1
Segment 1
Producer Consumer
Broker 1
Topic1-Part1
Broker 2
Topic1-Part2
Broker 3
Topic1-Part3
Segment 2 Segment 3 Segment 4 Segment X
Segment 1
Segment 1 Segment 1
Segment 3 Segment 3
Segment 3
Segment 2
Segment 2
Segment 2
Segment 4
Segment 4
Segment 4
Segment X
Segment X
Segment X
Bookie
2
Bookie
3
Query
Coordinator
...
...
SQL Worker SQL Worker SQL Worker
SQL Worker
Query
Topic
Metadata
streamnative.io
Ingesting IoT Data via Java Pulsar
https://github.com/tspannhw/StreamingAnalyticsUsingFlinkSQL/
streamnative.io
Ingesting IoT Data via Java Pulsar
streamnative.io
Why Apache NiFi?
• Guaranteed delivery
• Data buffering
- Backpressure
- Pressure release
• Prioritized queuing
• Flow specific QoS
- Latency vs. throughput
- Loss tolerance
• Data provenance
• Supports push and pull
models
• Hundreds of processors
• Visual command and
control
• Over a sixty sources
• Flow templates
• Pluggable/multi-role
security
• Designed for extension
• Clustering
• Version Control
streamnative.io
Architecture
https://nifi.apache.org/docs/nifi-docs/html/overview.html
StreamNative Hub
StreamNative Cloud
Unified Batch and Stream COMPUTING
Batch
(Batch + Stream)
Unified Batch and Stream STORAGE
Offload
(Queuing + Streaming)
Apache Pulsar - Apache NiFi - MiNiFi <-> Events/Messages <-> Data Stores
Tiered Storage
Pulsar
---
KoP
---
MoP
---
Websocket
---
HTTP
Pulsar
Sink
Pulsar
Sink
Streaming
Edge Gateway
Protocols
End-to-End Streaming FLiP(N) IoT Apps
Demo
Wrap-Up
streamnative.io
Interested In Learning More?
Flink SQL Cookbook
The Github Source for Flink
SQL Demo
The GitHub Source for Demo
Manning's Apache Pulsar in
Action
O’Reilly Book
[11/8] PASS Data Community
[11/18] Developer Week Austin
[11/19] Porto Tech Hub Con
[12/3] Data Science Camp
Resources Free eBooks Upcoming Events
Let’s Keep
in Touch!
Timothy Spann
Developer Advocate
@PaasDev
https://www.linkedin.com/in/timothyspann
https://github.com/tspannhw

Ai dev world utilizing apache pulsar, apache ni fi and minifi for edgeai iot at scale

  • 1.
    Utilizing Apache Pulsar,Apache NiFi and MiNiFi for EdgeAI IoT at Scale Tim Spann | Developer Advocate
  • 2.
    streamnative.io Tim Spann Developer Advocate ●https://www.datainmotion.dev/ ● https://github.com/tspannhw/SpeakerProfile ● https://dev.to/tspannhw ● https://sessionize.com/tspann/ DZone Zone Leader and Big Data MVB Data DJay
  • 3.
    streamnative.io Founded by theoriginal developers of Apache Pulsar and Apache BookKeeper, StreamNative builds a cloud-native event streaming platform that enables enterprises to easily access data as real-time event streams.
  • 4.
  • 5.
    streamnative.io Apache is anopen source, cloud-native distributed messaging and streaming platform.
  • 6.
    streamnative.io What are theBenefits of Pulsar? Data Durability Scalability Geo-Replication Multi-Tenancy Unified Messaging Model
  • 7.
  • 8.
    streamnative.io A Unified MessagingPlatform Message Queuing Data Streaming
  • 9.
    streamnative.io Apache Pulsar Overview EnableGeo-Replicated Messaging ● Pub-Sub ● Geo-Replication ● Pulsar Functions ● Horizontal Scalability ● Multi-tenancy ● Tiered Persistent Storage ● Pulsar Connectors ● REST API ● CLI ● Many clients available ● Four Different Subscription Types ● Multi-Protocol Support ○ MQTT ○ AMQP ○ JMS ○ Kafka ○ ...
  • 10.
    streamnative.io What is thePulsar Ecosystem? ● Functions and Connectors ○ Functions: Lightweight stream processing ○ Connectors: Part of “Pulsar IO”, includes “Source” and “Sink” APIs ■ Files, Databases, Data tools, Cloud Services, etc ● Protocol Handlers ○ Allows Pulsar to handle additional protocols by an extendable API running in the broker ■ AoP (AMQP), KoP (Kafka), MoP (MQTT)
  • 11.
    streamnative.io What is thePulsar Ecosystem? (cont’d) ● Processing Engines ○ Supports modern processing engines ■ Flink and Spark, as well as Pulsar SQL (Presto/Trino) ● Offloaders ○ Allows data to be offloaded to cloud storage and used with existing Pulsar APIs ■ S3, GCP Cloud Storage, HDFS, File (NFS), Azure Blob Storage (in Pulsar 2.7.0)
  • 12.
    streamnative.io Pulsar Functions Provides asimple API to: ● Receive a message (consume) ● Process the message using your own code ● Send a message (produce) Takes care of the boilerplate code so there is no need to create producers and consumers.
  • 13.
    streamnative.io Moving Data Inand Out of Pulsar IO/Connectors are a simple way to integrate with external systems and move data in and out of Pulsar. ● Built on top of Pulsar Functions ● Built-in connectors - hub.streamnative.io Source Sink
  • 14.
  • 15.
    streamnative.io Pulsar SQL Presto/Trino workers canread segments directly from bookies (or offloaded storage) in parallel. Bookie 1 Segment 1 Producer Consumer Broker 1 Topic1-Part1 Broker 2 Topic1-Part2 Broker 3 Topic1-Part3 Segment 2 Segment 3 Segment 4 Segment X Segment 1 Segment 1 Segment 1 Segment 3 Segment 3 Segment 3 Segment 2 Segment 2 Segment 2 Segment 4 Segment 4 Segment 4 Segment X Segment X Segment X Bookie 2 Bookie 3 Query Coordinator ... ... SQL Worker SQL Worker SQL Worker SQL Worker Query Topic Metadata
  • 16.
    streamnative.io Ingesting IoT Datavia Java Pulsar https://github.com/tspannhw/StreamingAnalyticsUsingFlinkSQL/
  • 17.
  • 18.
    streamnative.io Why Apache NiFi? •Guaranteed delivery • Data buffering - Backpressure - Pressure release • Prioritized queuing • Flow specific QoS - Latency vs. throughput - Loss tolerance • Data provenance • Supports push and pull models • Hundreds of processors • Visual command and control • Over a sixty sources • Flow templates • Pluggable/multi-role security • Designed for extension • Clustering • Version Control
  • 19.
  • 20.
    StreamNative Hub StreamNative Cloud UnifiedBatch and Stream COMPUTING Batch (Batch + Stream) Unified Batch and Stream STORAGE Offload (Queuing + Streaming) Apache Pulsar - Apache NiFi - MiNiFi <-> Events/Messages <-> Data Stores Tiered Storage Pulsar --- KoP --- MoP --- Websocket --- HTTP Pulsar Sink Pulsar Sink Streaming Edge Gateway Protocols End-to-End Streaming FLiP(N) IoT Apps
  • 21.
  • 22.
  • 23.
    streamnative.io Interested In LearningMore? Flink SQL Cookbook The Github Source for Flink SQL Demo The GitHub Source for Demo Manning's Apache Pulsar in Action O’Reilly Book [11/8] PASS Data Community [11/18] Developer Week Austin [11/19] Porto Tech Hub Con [12/3] Data Science Camp Resources Free eBooks Upcoming Events
  • 24.
    Let’s Keep in Touch! TimothySpann Developer Advocate @PaasDev https://www.linkedin.com/in/timothyspann https://github.com/tspannhw