Serverless Event Streaming with Pulsar Functions
- 2. ● Apache Pulsar PMC Member
● Apache BookKeeper PMC Chair
● Twitter, Yahoo Alumni
● Founder of StreamNative
● Interested in technologies around Event Streaming
Who am I
- 3. ● What is Apache Pulsar?
● Event Stream - Pulsar view on Data
● When Event Streaming meets serverless
○ Programming Model
○ Architecture
○ Use cases
Agenda
- 9. ● One data, different ways to consume
● Queuing (aka stateless messaging)
○ Shared (* RabbitMQ)
● Streaming (aka stateful messaging)
○ Exclusive
○ Failover (* Kafka)
○ Key_Shared
Pulsar - Flexible Messaging
- 10. Pulsar - Cloud Native Architecture
Layered Architecture
❏ Independent scalability
❏ Instance failure recovery
❏ Balance-free on cluster
expansions
- 23. Pulsar Functions
● A serverless event streaming framework
● Lightweight computation
● Event-first, Stream-first
● Multi languages
● Multi runtimes
● SDK-less & SDK
- 25. API - Native Java / Python / Go Function
Golang
Python
Java
- 26. API - Function Context
● Logger
● State
● Metrics
● Security / Secrets
● ...
- 29. Context - State
● Global Managed State *
● Mutable by functions and admin-cli
● Queryable by functions and admin-cli
● State are stored at storage layer
● State are implemented using streams + snapshot
- 30. Context - State API
● Key/Value State API
○ putState
○ getState
● Counter State API
○ getCounter
○ incrCounter
- 31. Context - Metrics
● API - recordMetric(String metricName, double value)
● Exposed in prometheus format
● Collected by prometheus
- 32. Flexible Runtime
● Colocate with Broker - Thread & Process
● Managed Function Workers - Thread & Process
● External Schedulers - Container
○ Kubernetes
- 36. Event Routing
● Events are routed to different partitions
● Leverage Pulsar’s MessageRouter
● Existing MessageRouters
○ Round-Robin
○ SinglePartition
○ Hash (Murmur32)
● Customize MessageRouter
- 37. Auto load balancing
● Pulsar Functions use Pulsar’s auto-balancing mechanism on consumers
● Shared Subscription
○ Load is distributed among function instances (consumers) evenly
○ More function instances provides more processing capability
● Failover Subscription
○ Load is distributed among function instances (consumers) by
partitions
○ The number of function instances is limited by the number of
partitions
- 40. Pulsar Functions Architecture
● Stream-First Design
● Leverage existing infrastructure, no more external dependencies
● Core components
○ Function metadata management
○ Function Worker membership and coordination
○ Function Assignment
- 41. Function metadata management
● Store function metadata
● Key/value store backed by a Pulsar topic
○ Function FQFN as the store key
● Use compaction to compact function metadata
store
- 42. Function Worker Membership
● Manage the memberships of function workers
● Every function worker subscribes to a
coordination topic
● Pulsar broker tracks the alive consumers for a
subscription
● Pulsar queries the list of function workers by
querying the alive consumers of the
subscription to coordination topic
- 43. Function Assignment
● Function workers elect a leader as the scheduler
manager by using “Failover” subscription
● The scheduler manager computes the assignment
using function metadata and membership
● The assignments are published to Assignment Topic
● Each function worker receives assignment by
subscribing to Assignment Topic
- 44. Function Runtime Manager
● Function Runtime Manager manages the running
function instances
● It receives assignment from Assignment Topic
● It compares its current running state with the
assignments and react to the assignments
○ Start function instances to invoke functions
○ Stop function instances
- 51. Pulsar Functions Summary
● A serverless approach to do event streaming
● Flexible, lightweight, easy to understand and use
● Event-first, Stream-first
● Stateless + Stateful (*)
● Flexible runtime and data locality
● Functions can be orchestrated to do complex processing
○ Workflow, DAG, Iterations, Graph, and ...
- 52. Pulsar Functions Roadmap
● More languages support
● Function Orchestration
● Managed state vs Local state
● Large state
● Transactional Processing
● RL on Functions
● ...
- 53. Community
● Twitter: @apache_pulsar , @streamnativeio
● Wechat: ApachePulsar, StreamNative
● Mailing Lists: dev@pulsar.apache.org users@pulsar.apache.org
● Slack: https://apache-pulsar.slack.com/
● Github:
○ https://github.com/apache/pulsar
○ https://github.com/apache/bookkeeper
● Documentation: https://pulsar.apache.org