Stream or segment : what is the best way to access your events in Pulsar_Neng
- 2. Who Am I
❏ StreamNative Software Engineer
❏ Ex-Twitter
❏ Contributed to Apache Projects - Heron, Pulsar
❏ Interested in event streaming technologies
- 7. Pulsar Use Cases
❏ Unified Event Center/Bus (Queuing + Streaming)
❏ Billing Service
❏ Push Notification
❏ Worker Queue
❏ Logging Pipeline
❏ IoT
❏ Streaming-first, unified data processing
- 10. Data Processing Categories
❏ Batch
❏ The amount of data is huge
❏ Can run on a huge cluster
❏ Fine-grained fault tolerance
❏ Streaming
❏ Long running jobs
❏ Time critical
❏ scalability as well as fault tolerant
- 11. Data Processing Categories
❏ Interactive
❏ Time critical
❏ Medium data size
❏ Rerun on failures
❏ Batch
❏ The amount of data is huge
❏ Can run on a huge cluster
❏ Fine-grained fault tolerance
❏ Streaming
❏ Long running jobs
❏ Time critical
❏ scalability as well as fault tolerant
- 12. Data Processing Categories
❏ Interactive
❏ Time critical
❏ Medium data size
❏ Rerun on failures
❏ Batch
❏ The amount of data is huge
❏ Can run on a huge cluster
❏ Fine-grained fault tolerance
❏ Streaming
❏ Long running jobs
❏ Time critical
❏ scalability as well as fault tolerant
❏ Serverless
❏ Simple, light-weight processing
❏ Processing data with high
velocity
- 14. Pulsar Messaging API
❏ Read data from brokers with different Subscription Modes
❏ Consume / Seek / Receive
❏ Reprocessing data by rewinding (seeking) the cursors
- 16. Pulsar Segment API
❏ Read data from storage (bookkeeper or tiered storage)
❏ Fine-grained Parallelism
❏ Predicate pushdown (publish timestamp)
- 17. Segment Centric Storage
❏ Topic Partition (Managed Ledger)
❏ The storage layer for a single topic
partition
❏ Segment (Ledger)
❏ Single writer, append-only
❏ Replicated to multiple bookies
- 19. Apache Pulsar Data APIs
Bookie1 Bookie2 Bookie3 Bookie4
Producer Consumer
Broker 1 Broker 2 Broker 3
Bookie5
HADOOPGCSS3
Messaging API
Segment API
- 26. Benefits
❏ Unlimited Topic Partition Storage
❏ Instant Scaling without Data Rebalancing
❏ Broker Failure Recovery
❏ Bookie Failure Recovery
❏ Cluster Expansion
❏ Low latency reading for messaging data
❏ High throughput reading for batch data
❏ Reduced cost for whole data storage
- 29. Conclusion
❏ Apache Pulsar is a cloud-native messaging streaming system
❏ Multi layered architecture
❏ Segment centric storage
❏ Two levels of reading API: Pub/Sub + Segment
❏ Apache Pulsar provides a unified view of data
- 30. Community
❏ Pulsar Website: https://pulsar.apache.org
❏ Twitter: @apache_pulsar / @streamnativeio
❏ Slack: https://apache-pulsar.herokuapp.com
❏ Mailing Lists dev@pulsar.apache.org , users@pulsar.apache.org
❏ Github: https://github.com/apache/pulsar
❏ Medium: https://medium.com/streamnative