Pulsar Summit
San Francisco
Hotel Nikko
August 18 2022
Ecosystem
Simplify Pulsar
Functions
Development with SQL
Neng Lu
Platform Engineering Lead • StreamNative
Neng Lu is the platform engineering
lead of compute at StreamNative. He
drives the development of Pulsar
Functions, Serverless Computing and
ecosystem integration. He is also a
committer of Apache Pulsar.
Neng Lu
Platform Engineering Lead
StreamNative
Rui Fu is a senior software engineer at
StreamNative and a committer of
Apache Pulsar. He actively contributes
to Pulsar Functions, Function Mesh and
Serverless Computing
Rui Fu
Senior Software Engineer
StreamNative
Pulsar Functions
Pulsar Functions – Recap
Pulsar Functions – Recap
Pulsar Functions – Use Cases
● ETL(Extract-Transform-Load) Jobs
● Microservices
● Event Routing
● Real-time Aggregation
● Easy Operation
○ Fully Integrated with Pulsar
○ No Extra Setup Needed
● Easy Development
○ Intuitive APIs:
■ Java: public O process(I input, Context context)
■ Python: def process(self, input, context)
■ Golang: func HandleRequest(ctx context.Context, in []byte) error
Pulsar Functions – Benefits
Easier Operation?
Function Worker Recap
● Function Worker interleaves with Pulsar Broker
● Need to set up separate Function Worker cluster
● Function Worker relies on Pulsar Topics for scheduling
● Function Worker’s k8s runtime not truly cloud native
Function Mesh
Function Mesh – Recap
● Serverless framework to run Pulsar Functions in a cloud native way
● Consists of:
○ Set of CRDs for defining Pulsar Functions and Connectors
■ Function
■ Source
■ Sink
○ Operator that constantly reconciles the submitted CR
■ create sts, service, configmap, etc.
■ update according to user change
■ auto-scale if configured
Function Mesh – Architecture
Function Mesh – Summary
● Scheduling by Kubernetes not Function Worker
○ Simplicity
○ Reliability
○ Stability (both for function & brokers)
○ Extensibility (HPA, VPA, Scale-To-Zero etc)
● Compatible with Pulsar Admin Rest API
○ Seamless user experience
Easier Development?
Use Case 1 – Filtering/Routing
● Commonly used for different business purposes → duplicated
code development
● Go through the whole Pulsar Functions dev life cycle
○ (Learn)
○ Develop
○ Package
○ Debug
○ Deploy
Use Case 2 – Connector with Transformations
● Long pipeline:
○ Connector
○ Transformation Function (Often duplicated with minor diffs)
○ Intermediate topic
● Go through the Pulsar Functions life cycle TWICE:
○ Connector
■ Develop(optional)
■ …
○ Transformation Function
■ Develop
■ Package
■ …
Any Solution?
SELECT * FROM StreamNative
SQL Abstraction – Why?
● Easiest to learn and apply
● Wide audience
● Safe & Controlled Operations
● Easy job life-cycle management
● Stream Processing Trend
SQL Abstraction – What?
● IS
○ an simplified way to develop Pulsar Functions pipeline
● IS NOT
○ an interactive tool to run ad-hoc query
SQL Abstraction – Components
● Gateway
● Runner
● Cli
SQL Abstraction – Gateway
● Parser <-> Runner
● Rest API Server <-> Cli
SQL Gateway – Parser
● Antlr4 grammar
● AST processor
● JSON representation
SQL Statement
Abstract
Syntax Tree
JSON
Representation
Parser – Grammar
SQL Abstraction – Syntax
● Value Expression
○ Literal: Primitive value, like string, number, or boolean
○ Field: message payload field
○ KEY: message key
○ PROPERTIES[P_KEY]: message property
● WITH Item Definition
○ WITH MERGE KEYVALUE: Merge the fields of KeyValue
schema
○ WITH UNWRAP KEY|VALUE: Extract Key or Value fields from
KeyValue schema
Parser – Examples
Parser – AST
Parser – JSON Representation
● Intermediate Representation
○ Filter
○ Router
○ Projection
○ WITH Conditions
SQL Abstraction – Runner
● An implementation of Pulsar
Functions API
● Accept the JSON
representation
● Generate Filtering/Routing
processor during initialization
● Utilize `GenericObject` to
handle different schemas
● Directly push result into target
topic
SQL Abstraction – Runner
● Processor
○ An interface for classes that
implement data transformations
○ schema projections
○ data manipulations
○ data type conversions
● Chain Compiler
○ List<Processor>
○ Compiled from the SQL Context
SQL Gateway – REST APIs
Query Management /snsql/query POST
/snsql/query/pause/$NAME GET
/snsql/query/resume/$NAME GET
/snsql/query/delete/$NAME GET
/snsql/query/status/$NAME GET
/snsql/query/stats/$NAME GET
Gateway Information /snsql/info GET
/snsql/healthcheck GET
SQL Gateway – REST Server
● Quarkus Framework
○ easy to implement
○ cloud-native support
● Metadata Management
○ write into Pulsar topic
○ read with TableView API
SQL Abstraction – CLI
● Terminal based tool
● Interact with the
SQL gateway APIs
● Query management
SQL Abstraction – Summary
Demo
Future Work
● Syntax support for Source/Sink
● Builtin system function support
● Aggregation Operation
● Join Operation
Resources
● Pulsar Functions:
https://pulsar.apache.org/docs/functions-overview/
● Function Mesh: https://functionmesh.io/
● Slack & Mailing List:
○ Apache Pulsar Slack: https://apache-pulsar.slack.com/
○ StreamNative Community Slack:
https://streamnativecommunity.slack.com/
○ Apache Pulsar Mailing List:
■ users@pulsar.apache.org
■ dev@pulsar.apache.org
Neng Lu
Thank you!
nlu@streamnative.io
Pulsar Summit
San Francisco
Hotel Nikko
August 18 2022
rfu@streamnative.io
Rui Fu
@nlu90
@freeznet rfu
nlu

Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022

  • 1.
    Pulsar Summit San Francisco HotelNikko August 18 2022 Ecosystem Simplify Pulsar Functions Development with SQL Neng Lu Platform Engineering Lead • StreamNative
  • 2.
    Neng Lu isthe platform engineering lead of compute at StreamNative. He drives the development of Pulsar Functions, Serverless Computing and ecosystem integration. He is also a committer of Apache Pulsar. Neng Lu Platform Engineering Lead StreamNative Rui Fu is a senior software engineer at StreamNative and a committer of Apache Pulsar. He actively contributes to Pulsar Functions, Function Mesh and Serverless Computing Rui Fu Senior Software Engineer StreamNative
  • 3.
  • 4.
  • 5.
  • 6.
    Pulsar Functions –Use Cases ● ETL(Extract-Transform-Load) Jobs ● Microservices ● Event Routing ● Real-time Aggregation
  • 7.
    ● Easy Operation ○Fully Integrated with Pulsar ○ No Extra Setup Needed ● Easy Development ○ Intuitive APIs: ■ Java: public O process(I input, Context context) ■ Python: def process(self, input, context) ■ Golang: func HandleRequest(ctx context.Context, in []byte) error Pulsar Functions – Benefits
  • 8.
  • 9.
    Function Worker Recap ●Function Worker interleaves with Pulsar Broker ● Need to set up separate Function Worker cluster ● Function Worker relies on Pulsar Topics for scheduling ● Function Worker’s k8s runtime not truly cloud native
  • 10.
  • 11.
    Function Mesh –Recap ● Serverless framework to run Pulsar Functions in a cloud native way ● Consists of: ○ Set of CRDs for defining Pulsar Functions and Connectors ■ Function ■ Source ■ Sink ○ Operator that constantly reconciles the submitted CR ■ create sts, service, configmap, etc. ■ update according to user change ■ auto-scale if configured
  • 12.
    Function Mesh –Architecture
  • 13.
    Function Mesh –Summary ● Scheduling by Kubernetes not Function Worker ○ Simplicity ○ Reliability ○ Stability (both for function & brokers) ○ Extensibility (HPA, VPA, Scale-To-Zero etc) ● Compatible with Pulsar Admin Rest API ○ Seamless user experience
  • 14.
  • 15.
    Use Case 1– Filtering/Routing ● Commonly used for different business purposes → duplicated code development ● Go through the whole Pulsar Functions dev life cycle ○ (Learn) ○ Develop ○ Package ○ Debug ○ Deploy
  • 16.
    Use Case 2– Connector with Transformations ● Long pipeline: ○ Connector ○ Transformation Function (Often duplicated with minor diffs) ○ Intermediate topic ● Go through the Pulsar Functions life cycle TWICE: ○ Connector ■ Develop(optional) ■ … ○ Transformation Function ■ Develop ■ Package ■ …
  • 17.
  • 18.
    SELECT * FROMStreamNative
  • 19.
    SQL Abstraction –Why? ● Easiest to learn and apply ● Wide audience ● Safe & Controlled Operations ● Easy job life-cycle management ● Stream Processing Trend
  • 20.
    SQL Abstraction –What? ● IS ○ an simplified way to develop Pulsar Functions pipeline ● IS NOT ○ an interactive tool to run ad-hoc query
  • 21.
    SQL Abstraction –Components ● Gateway ● Runner ● Cli
  • 22.
    SQL Abstraction –Gateway ● Parser <-> Runner ● Rest API Server <-> Cli
  • 23.
    SQL Gateway –Parser ● Antlr4 grammar ● AST processor ● JSON representation SQL Statement Abstract Syntax Tree JSON Representation
  • 24.
  • 25.
    SQL Abstraction –Syntax ● Value Expression ○ Literal: Primitive value, like string, number, or boolean ○ Field: message payload field ○ KEY: message key ○ PROPERTIES[P_KEY]: message property ● WITH Item Definition ○ WITH MERGE KEYVALUE: Merge the fields of KeyValue schema ○ WITH UNWRAP KEY|VALUE: Extract Key or Value fields from KeyValue schema
  • 26.
  • 27.
  • 28.
    Parser – JSONRepresentation ● Intermediate Representation ○ Filter ○ Router ○ Projection ○ WITH Conditions
  • 29.
    SQL Abstraction –Runner ● An implementation of Pulsar Functions API ● Accept the JSON representation ● Generate Filtering/Routing processor during initialization ● Utilize `GenericObject` to handle different schemas ● Directly push result into target topic
  • 30.
    SQL Abstraction –Runner ● Processor ○ An interface for classes that implement data transformations ○ schema projections ○ data manipulations ○ data type conversions ● Chain Compiler ○ List<Processor> ○ Compiled from the SQL Context
  • 31.
    SQL Gateway –REST APIs Query Management /snsql/query POST /snsql/query/pause/$NAME GET /snsql/query/resume/$NAME GET /snsql/query/delete/$NAME GET /snsql/query/status/$NAME GET /snsql/query/stats/$NAME GET Gateway Information /snsql/info GET /snsql/healthcheck GET
  • 32.
    SQL Gateway –REST Server ● Quarkus Framework ○ easy to implement ○ cloud-native support ● Metadata Management ○ write into Pulsar topic ○ read with TableView API
  • 33.
    SQL Abstraction –CLI ● Terminal based tool ● Interact with the SQL gateway APIs ● Query management
  • 34.
  • 35.
  • 37.
    Future Work ● Syntaxsupport for Source/Sink ● Builtin system function support ● Aggregation Operation ● Join Operation
  • 38.
    Resources ● Pulsar Functions: https://pulsar.apache.org/docs/functions-overview/ ●Function Mesh: https://functionmesh.io/ ● Slack & Mailing List: ○ Apache Pulsar Slack: https://apache-pulsar.slack.com/ ○ StreamNative Community Slack: https://streamnativecommunity.slack.com/ ○ Apache Pulsar Mailing List: ■ users@pulsar.apache.org ■ dev@pulsar.apache.org
  • 39.
    Neng Lu Thank you! nlu@streamnative.io PulsarSummit San Francisco Hotel Nikko August 18 2022 rfu@streamnative.io Rui Fu @nlu90 @freeznet rfu nlu