FLiP Into Trino
Tim Spann | Developer Advocate
David Kjerrumgaard | Developer Advocate
streamnative.io
Founded by the original developers of
Apache Pulsar and Apache BookKeeper,
StreamNative builds a cloud-native event
streaming platform that enables
enterprises to easily access data as
real-time event streams.
streamnative.io
Speaker Bio
https://github.com/david-streamlio
https://pulsar-summit.org/en/event/virtual-conference-2020/s
peaker/david-kjerrumgaard
https://www.slideshare.net/streamnative/using-apache-pulsar
-to-provide-realtime-iot-analytics-on-the-edgedavid
David Kjerrumgaard
Developer Advocate
streamnative.io
Tim Spann, Developer Advocate
DZone Zone Leader and Big Data MVB Data DJay
● Apache Flink
● Apache Pulsar
● StreamNative's Flink Connector for Pulsar
● Apache NiFi
● Trino
FLiP(N) into Trino Stack
Apache Pulsar
streamnative.io
Apache is an open source, cloud-native
distributed messaging and streaming platform.
streamnative.io
A Unified Messaging Platform
Message
Queuing
Data
Streaming
Apache Pulsar
● Pub-Sub
● Geo-Replication
● Pulsar Functions
● Horizontal Scalability
● Multi-tenancy
● Tiered Persistent Storage
● Pulsar Connectors
● REST API
● CLI
● Many clients available
● Four Different Subscription
Types
● Multi-Protocol Support
○ MQTT
○ AMQP
○ JMS
○ Kafka
○ ...
● “Bookies”
● Stores messages and
cursors
● Messages are grouped in
segments/ledgers
● A group of bookies form an
“ensemble” to store a ledger
● “Brokers”
● Handles message routing
and connections
● Stateless, but with caches
● Automatic load-balancing
● Topics are composed of
multiple segments
● Stores metadata for both
Pulsar and BookKeeper
● Service discovery
Store
Messages
Metadata &
Service Discovery
Metadata &
Service Discovery
Pulsar Cluster
11
streamnative.io
Apache Pulsar - Example Sinks
https://hub.streamnative.io/connectors/cloud-storage-sink/2.5.1/
● mongoDB
● AWS Lambda
● redis
● AWS S3
● GCS
Reader and
Batch API
Pulsar
IO/Connectors
Stream Processor
Applications
Prebuilt Connectors Custom Connectors
Microservices or
Event-Driven Architecture
Pub/Sub
API
Publisher
Subscriber
Admin
API
Operators &
Administrators
Teams
Tenant
Pulsar API
Design
Subscription Modes
Different subscription modes
have different semantics:
Exclusive/Failover -
guaranteed order, single
active consumer
Shared - multiple active
consumers, no order
Key_Shared - multiple active
consumers, order for given
key
Producer 1
Producer 2
Pulsar Topic
Subscription D
Consumer D-1
Consumer D-2
Key-Shared
<
K
1,
V
10
>
<
K
1,
V
11
>
<
K
1,
V
12
>
<
K
2
,V
2
0
>
<
K
2
,V
2
1>
<
K
2
,V
2
2
>
Subscription C
Consumer C-1
Consumer C-2
Shared
<
K
1,
V
10
>
<
K
2,
V
21
>
<
K
1,
V
12
>
<
K
2
,V
2
0
>
<
K
1,
V
11
>
<
K
2
,V
2
2
>
Subscription A Consumer A
Exclusive
Subscription B
Consumer B-1
Consumer B-2
In case of failure in
Consumer B-1
Failover
Unified
Messaging Model
Streaming
Messaging
Producer 1
Producer 2
Pulsar
Topic/Partition
m0
m1
m2
m3
m4
Consumer D-1
Consumer D-2
Consumer D-3
Subscription D
<
k
2
,
v
1
>
<
k
2
,
v
3
>
<k3,v2>
<
k
1
,
v
0
>
<
k
1
,
v
4
>
Key-Shared
Consumer C-1
Consumer C-2
Consumer C-3
Subscription C
m1
m2
m3
m4
m0
Shared
Failover
Consumer B-1
Consumer B-0
Subscription B
m1
m2
m3
m4
m0
In case of failure in
Consumer B-0
Consumer A-1
Consumer A-0
Subscription A
m1
m2
m3
m4
m0
Exclusive
X
streamnative.io
MQTT on Pulsar (MoP)
streamnative.io
MQTT on Pulsar (MoP) Configuration
messagingProtocols=mqtt
# directory
protocolHandlerDirectory=./protocols
#mqtt 3.1.1 - port / ip
mqttListeners=mqtt://127.0.0.1:1883
advertisedAddress=127.0.0.1
Pulsar SQL
Using Trino for querying Pulsar
topic data.
https://pulsar.apache.org/docs/en/sql-overview/
Pulsar SQL
Presto/Trino workers
can read segments
directly from bookies
(or offloaded storage)
in parallel.
Bookie
1
Segment 1
Producer Consumer
Broker 1
Topic1-Part1
Broker 2
Topic1-Part2
Broker 3
Topic1-Part3
Segment 2 Segment 3 Segment 4 Segment X
Segment 1
Segment 1 Segment 1
Segment 3 Segment 3
Segment 3
Segment 2
Segment 2
Segment 2
Segment 4
Segment 4
Segment 4
Segment X
Segment X
Segment X
Bookie
2
Bookie
3
Query
Coordinator
...
...
SQL Worker SQL Worker SQL Worker
SQL Worker
Query
Topic
Metadata
Pulsar SQL
https://streamnative.io/blog/case/2020-05-07-zhaopin-sql/
streamnative.io
Query Your Topics with Pulsar SQL (Trino)
Running a Pulsar SQL Query
● To run a query, you need to start Pulsar SQL with:
$ pulsar sql
● All queries must:
○ Be terminated with a ;
○ Use single quotes (') for strings
○ If you run a query with many results, Pulsar SQL will show a list
● Exit out by typing q
○ Scroll through results with the up and down arrows or page up and
page down keys
● Queries can be run using Presto's/Trino’s REST API
○ Query results are returned as JSON
Viewing Topics with Pulsar SQL
● Show available namespaces
SHOW schemas IN pulsar;
● Show topics in a namespace
SHOW tables IN pulsar."public/default";
● Show schema in a topic
SHOW columns IN pulsar."public/default".mytopic;
Supported SQL Syntax
SELECT card, suit FROM cards;
SELECT * FROM cards WHERE suit = "Spade";
SELECT * FROM cards WHERE card LIKE "1%";
SELECT * FROM cards WHERE suit = "Spade" AND card = "1";
SELECT * FROM cards LIMIT 10;
SELECT * FROM cards WHERE suit = "Spade" LIMIT 10;
SELECT suit, COUNT(card) FROM cards GROUP BY suit;
SELECT suit, card FROM cards ORDER BY suit, card;
Defining Schemas
To execute a query, Pulsar SQL needs to know the schema.
● Schemas are accessible from the Broker and stored in BookKeeper.
● Pulsar SQL needs to know:
○ Name of the column
○ Type of the column
○ Nullability of the column
● Pulsar SQL currently supports Avro and JSON for automatic schema
detection.
Use cases for Pulsar SQL
● Pulsar SQL is a useful tool for answering questions about data in your
streams, such as basic analytics or searching for specific data.
● Pulsar SQL is not intended for high throughput queries or for running
“continuous” queries that update as new records are added.
https://pulsar.apache.org/docs/en/sql-rest-api/
StreamNative
Cloud
StreamNative Solution
Application Messaging Data Pipelines Real-time Contextual Analytics
Tiered Storage
APP Layer
Computing
Layer
Storage
Layer
StreamNative
Platform
IaaS Layer
Micro
Service
Notification Dashboard Risk Control Auditing
Payment ETL
A cloud-native, real-time
messaging and streaming
platform to support
multi-cloud and hybrid
cloud strategies.
Powered
by Pulsar
Built for
Containers
Flink SQL
Cloud
Native
Use Cases
USE CASE
IoT Ingestion: High-volume
streaming sources, sensors,
multiple message formats,
diverse protocols and
multi-vendor devices
creates data ingestion
challenges.
Other Sources: Transit data,
news, twitter, status feeds,
REST data, stock data and
more.
StreamNative Hub
StreamNative Cloud
Unified Batch and Stream COMPUTING
Batch
(Batch + Stream)
Unified Batch and Stream STORAGE
Offload
(Queuing + Streaming)
End-to-End Streaming FLiPN Edge AI Application
Apache Flink - Apache Pulsar - Apache NiFi <-> Trino
Tiered Storage
Pulsar
---
KoP
---
MoP
---
Websocket
---
HTTP
Pulsar
Sink
Pulsar
Sink
Streaming
Edge Gateway
Protocols
streamnative.io
https://github.com/tspannhw/minifi-xaviernx/
https://github.com/tspannhw/minifi-jetson-nano
https://github.com/tspannhw/Flip-iot
https://github.com/tspannhw/FLiP-EdgeAI
https://github.com/tspannhw/FLiP-CloudIngest
https://github.com/tspannhw/FLiP-Transit
https://github.com/tspannhw/FLiP-Jetson
https://www.datainmotion.dev/2020/10/flank-streaming-edgeai-on-new-nvidia.html
DEMO TIME
Using NVIDIA Jetson Devices With
Pulsar
34
streamnative.io
Show Me More Data
streamnative.io
Ingesting IoT Data via Java Pulsar
https://github.com/tspannhw/StreamingAnalyticsUsingFlinkSQL/
streamnative.io
Ingesting IoT Data via Java Pulsar
streamnative.io
FLiP Into Trino with Apache NiFi
Apache NiFi to JDBC Sink
https://docs.starburst.io/data-consumer/clients/jdbc.html
https://docs.starburst.io/data-consumer/clients/dbeaver.html
https://hub.docker.com/r/trinodb/trino
https://trino.io/download.html
docker run -p 8080:8080 --name trino trinodb/trino
streamnative.io
FLiP Into Trino with Apache NiFi
streamnative.io
FLiP Into Trino with Apache NiFi
streamnative.io
Deeper Content
● https://www.datainmotion.dev/2020/10/running-flink-sql-against-kafka-using.html
● https://www.datainmotion.dev/2020/10/top-25-use-cases-of-cloudera-flow.html
● https://github.com/tspannhw/EverythingApacheNiFi
● https://github.com/tspannhw/CloudDemo2021
● https://github.com/tspannhw/FLiP-Into-Trino
● https://github.com/tspannhw/StreamingSQLExamples
● https://www.linkedin.com/pulse/2021-schedule-tim-spann/
● https://github.com/tspannhw/StreamingSQLExamples/blob/8d02e62260e82b027b43abb911b5c366a3
081927/README.md
● https://www.pulsardeveloper.com/
● https://streamnative.io/success-story/zhaopin-pulsar-sql/
Let’s Keep
in Touch!
Tim Spann
Developer Advocate
@PassDev
https://www.linkedin.com/in/timothyspann
https://github.com/tspannhw
Connect with the
Community
& Stay
Up-To-Date
● Join the Pulsar Slack channel -
Apache-Pulsar.slack.com
● Follow @streamnativeio and @apache_pulsar
on Twitter
● Subscribe to Monthly Pulsar Newsletter
for major news, events, project updates,
and resources in the Pulsar community
streamnative.io
Pulsar Summit Asia
November 20-21, 2021
Contact us at partners@pulsar-summit.org to become a sponsor or partner
streamnative.io
Interested In Learning More?
Flink SQL Cookbook
The Github Source for Flink
SQL Demo
The GitHub Source for Demo
Manning's Apache Pulsar in
Action
O’Reilly Book
[10/21] Trino Summit
Resources Free eBooks Upcoming Events
streamnative.io
Questions

FLiP Into Trino