SlideShare a Scribd company logo
Serverless Event Streaming
with Pulsar Functions
Sijie Guo (@sijieg)
2019.06.22
● Apache Pulsar PMC Member
● Apache BookKeeper PMC Chair
● Twitter, Yahoo Alumni
● Founder of StreamNative
● Interested in technologies around Event Streaming
Who am I
● What is Apache Pulsar?
● Event Stream - Pulsar view on Data
● When Event Streaming meets serverless
○ Programming Model
○ Architecture
○ Use cases
Agenda
What is Apache Pulsar?
“Flexible pub/sub messaging
backed by a durable stream storage”
What is Apache Pulsar?
Pulsar - Pub/Sub
Pulsar - Multi Tenancy
Pulsar - Flexible Messaging
● One data, different ways to consume
● Queuing (aka stateless messaging)
○ Shared (* RabbitMQ)
● Streaming (aka stateful messaging)
○ Exclusive
○ Failover (* Kafka)
○ Key_Shared
Pulsar - Flexible Messaging
Pulsar - Cloud Native Architecture
Layered Architecture
❏ Independent scalability
❏ Instance failure recovery
❏ Balance-free on cluster
expansions
A Pulsar view on Data
Pulsar View - Topic
Pulsar View - Partition
Pulsar View - Segment
Pulsar View - Event Stream
Event Stream is the right foundation for your data
M(essaging), S(torage), P(rocessing)
MSP - Interactive Queries
MSP - Stream & Batch Processing
MSP - What is next?
What is next?
When Event Streaming
meets Serverless
Introduce Pulsar Functions
Pulsar Functions
● A serverless event streaming framework
● Lightweight computation
● Event-first, Stream-first
● Multi languages
● Multi runtimes
● SDK-less & SDK
Function Elements
● Input Topics
● Output Topics
● Function
● State
● Log Topics
API - Native Java / Python / Go Function
Golang
Python
Java
API - Function Context
● Logger
● State
● Metrics
● Security / Secrets
● ...
Context - Logger
Context - State
Context - State
● Global Managed State *
● Mutable by functions and admin-cli
● Queryable by functions and admin-cli
● State are stored at storage layer
● State are implemented using streams + snapshot
Context - State API
● Key/Value State API
○ putState
○ getState
● Counter State API
○ getCounter
○ incrCounter
Context - Metrics
● API - recordMetric(String metricName, double value)
● Exposed in prometheus format
● Collected by prometheus
Flexible Runtime
● Colocate with Broker - Thread & Process
● Managed Function Workers - Thread & Process
● External Schedulers - Container
○ Kubernetes
Colocate with Brokers
Managed Function Workers
External Schedulers - Kubernetes
Event Routing
● Events are routed to different partitions
● Leverage Pulsar’s MessageRouter
● Existing MessageRouters
○ Round-Robin
○ SinglePartition
○ Hash (Murmur32)
● Customize MessageRouter
Auto load balancing
● Pulsar Functions use Pulsar’s auto-balancing mechanism on consumers
● Shared Subscription
○ Load is distributed among function instances (consumers) evenly
○ More function instances provides more processing capability
● Failover Subscription
○ Load is distributed among function instances (consumers) by
partitions
○ The number of function instances is limited by the number of
partitions
Integrated with pulsar-admin CLI
Pulsar Functions Architecture
Pulsar Functions Architecture
● Stream-First Design
● Leverage existing infrastructure, no more external dependencies
● Core components
○ Function metadata management
○ Function Worker membership and coordination
○ Function Assignment
Function metadata management
● Store function metadata
● Key/value store backed by a Pulsar topic
○ Function FQFN as the store key
● Use compaction to compact function metadata
store
Function Worker Membership
● Manage the memberships of function workers
● Every function worker subscribes to a
coordination topic
● Pulsar broker tracks the alive consumers for a
subscription
● Pulsar queries the list of function workers by
querying the alive consumers of the
subscription to coordination topic
Function Assignment
● Function workers elect a leader as the scheduler
manager by using “Failover” subscription
● The scheduler manager computes the assignment
using function metadata and membership
● The assignments are published to Assignment Topic
● Each function worker receives assignment by
subscribing to Assignment Topic
Function Runtime Manager
● Function Runtime Manager manages the running
function instances
● It receives assignment from Assignment Topic
● It compares its current running state with the
assignments and react to the assignments
○ Start function instances to invoke functions
○ Stop function instances
Use Cases
Content Routing
Message Filtering
Transformation
Alert and thresholds
Complex Event Processing Pipelines
Pulsar Functions Summary
● A serverless approach to do event streaming
● Flexible, lightweight, easy to understand and use
● Event-first, Stream-first
● Stateless + Stateful (*)
● Flexible runtime and data locality
● Functions can be orchestrated to do complex processing
○ Workflow, DAG, Iterations, Graph, and ...
Pulsar Functions Roadmap
● More languages support
● Function Orchestration
● Managed state vs Local state
● Large state
● Transactional Processing
● RL on Functions
● ...
Community
● Twitter: @apache_pulsar , @streamnativeio
● Wechat: ApachePulsar, StreamNative
● Mailing Lists: dev@pulsar.apache.org users@pulsar.apache.org
● Slack: https://apache-pulsar.slack.com/
● Github:
○ https://github.com/apache/pulsar
○ https://github.com/apache/bookkeeper
● Documentation: https://pulsar.apache.org
Thanks!

More Related Content

Serverless Event Streaming with Pulsar Functions

  • 1. Serverless Event Streaming with Pulsar Functions Sijie Guo (@sijieg) 2019.06.22
  • 2. ● Apache Pulsar PMC Member ● Apache BookKeeper PMC Chair ● Twitter, Yahoo Alumni ● Founder of StreamNative ● Interested in technologies around Event Streaming Who am I
  • 3. ● What is Apache Pulsar? ● Event Stream - Pulsar view on Data ● When Event Streaming meets serverless ○ Programming Model ○ Architecture ○ Use cases Agenda
  • 4. What is Apache Pulsar?
  • 5. “Flexible pub/sub messaging backed by a durable stream storage” What is Apache Pulsar?
  • 7. Pulsar - Multi Tenancy
  • 8. Pulsar - Flexible Messaging
  • 9. ● One data, different ways to consume ● Queuing (aka stateless messaging) ○ Shared (* RabbitMQ) ● Streaming (aka stateful messaging) ○ Exclusive ○ Failover (* Kafka) ○ Key_Shared Pulsar - Flexible Messaging
  • 10. Pulsar - Cloud Native Architecture Layered Architecture ❏ Independent scalability ❏ Instance failure recovery ❏ Balance-free on cluster expansions
  • 11. A Pulsar view on Data
  • 12. Pulsar View - Topic
  • 13. Pulsar View - Partition
  • 14. Pulsar View - Segment
  • 15. Pulsar View - Event Stream
  • 16. Event Stream is the right foundation for your data
  • 18. MSP - Interactive Queries
  • 19. MSP - Stream & Batch Processing
  • 20. MSP - What is next? What is next?
  • 23. Pulsar Functions ● A serverless event streaming framework ● Lightweight computation ● Event-first, Stream-first ● Multi languages ● Multi runtimes ● SDK-less & SDK
  • 24. Function Elements ● Input Topics ● Output Topics ● Function ● State ● Log Topics
  • 25. API - Native Java / Python / Go Function Golang Python Java
  • 26. API - Function Context ● Logger ● State ● Metrics ● Security / Secrets ● ...
  • 29. Context - State ● Global Managed State * ● Mutable by functions and admin-cli ● Queryable by functions and admin-cli ● State are stored at storage layer ● State are implemented using streams + snapshot
  • 30. Context - State API ● Key/Value State API ○ putState ○ getState ● Counter State API ○ getCounter ○ incrCounter
  • 31. Context - Metrics ● API - recordMetric(String metricName, double value) ● Exposed in prometheus format ● Collected by prometheus
  • 32. Flexible Runtime ● Colocate with Broker - Thread & Process ● Managed Function Workers - Thread & Process ● External Schedulers - Container ○ Kubernetes
  • 35. External Schedulers - Kubernetes
  • 36. Event Routing ● Events are routed to different partitions ● Leverage Pulsar’s MessageRouter ● Existing MessageRouters ○ Round-Robin ○ SinglePartition ○ Hash (Murmur32) ● Customize MessageRouter
  • 37. Auto load balancing ● Pulsar Functions use Pulsar’s auto-balancing mechanism on consumers ● Shared Subscription ○ Load is distributed among function instances (consumers) evenly ○ More function instances provides more processing capability ● Failover Subscription ○ Load is distributed among function instances (consumers) by partitions ○ The number of function instances is limited by the number of partitions
  • 40. Pulsar Functions Architecture ● Stream-First Design ● Leverage existing infrastructure, no more external dependencies ● Core components ○ Function metadata management ○ Function Worker membership and coordination ○ Function Assignment
  • 41. Function metadata management ● Store function metadata ● Key/value store backed by a Pulsar topic ○ Function FQFN as the store key ● Use compaction to compact function metadata store
  • 42. Function Worker Membership ● Manage the memberships of function workers ● Every function worker subscribes to a coordination topic ● Pulsar broker tracks the alive consumers for a subscription ● Pulsar queries the list of function workers by querying the alive consumers of the subscription to coordination topic
  • 43. Function Assignment ● Function workers elect a leader as the scheduler manager by using “Failover” subscription ● The scheduler manager computes the assignment using function metadata and membership ● The assignments are published to Assignment Topic ● Each function worker receives assignment by subscribing to Assignment Topic
  • 44. Function Runtime Manager ● Function Runtime Manager manages the running function instances ● It receives assignment from Assignment Topic ● It compares its current running state with the assignments and react to the assignments ○ Start function instances to invoke functions ○ Stop function instances
  • 51. Pulsar Functions Summary ● A serverless approach to do event streaming ● Flexible, lightweight, easy to understand and use ● Event-first, Stream-first ● Stateless + Stateful (*) ● Flexible runtime and data locality ● Functions can be orchestrated to do complex processing ○ Workflow, DAG, Iterations, Graph, and ...
  • 52. Pulsar Functions Roadmap ● More languages support ● Function Orchestration ● Managed state vs Local state ● Large state ● Transactional Processing ● RL on Functions ● ...
  • 53. Community ● Twitter: @apache_pulsar , @streamnativeio ● Wechat: ApachePulsar, StreamNative ● Mailing Lists: dev@pulsar.apache.org users@pulsar.apache.org ● Slack: https://apache-pulsar.slack.com/ ● Github: ○ https://github.com/apache/pulsar ○ https://github.com/apache/bookkeeper ● Documentation: https://pulsar.apache.org