SlideShare a Scribd company logo
TGIPulsar - EP #006: Lifecycle of a Pulsar message
streamnative.io
Lifecycle of a
Pulsar Message
#TGIPulsar EP-006
Message Lifecycle
✓ Message Flow
✓ Message Retention
Message Flow
Brokers + Bookies
Bookie 0 Bookie 1 Bookie 2
The processes for storing
data are called bookies. They
persist data for Pulsar.
Broker 0 Broker 1 Broker 2
Brokers are “stateless”. They
serve clients for producing and
consuming events
ZooKeeper
Bookie 0 Bookie 1 Bookie 2
The processes for storing
data are called bookies. They
persist data for Pulsar.
Broker 0 Broker 1 Broker 2
Brokers are “stateless”. They
serve clients for producing and
consuming events
ZooKeeper
ZooKeeper
ZooKeeper
ZooKeeper is used for storing the
metadata for Pulsar and
bookkeeper as well as for
discovering brokers and bookies.
Pulsar Producer 0 Producer 1
Topic
Partition 0 Partition 1 Partition 2
Broker X Broker Y Broker Z
Subscription A
Consumer (P012)
Produce Producer 0 Producer 1
Topic
Partition 0 Partition 1 Partition 2
Broker 0 Broker 1 Broker 2
Bookie 0 Bookie 1 Bookie 2
1. A message is created and a
partition is selected
2. The message is sent to the
owner broker that serves the
selected partition
3. The message is written to N bookies in
parallel by the owner broker. The message
is written once and stored in their entirety.
4. Once the message has been
written by 2 bookies, the broker
will acknowledge the message
Consume
(Cached)
Topic
Partition 0 Partition 1 Partition 2
Broker 0 Broker 1 Broker 2
Bookie 0 Bookie 1 Bookie 2
Consumer (P012)
1. The consumer subscribes to a
topic. It connects to the owner
brokers serving the partitions.
2. Broker sends messages for the
partition coming out of its
memory cache
3. Consumer acknowledges a
message after processing it.
Broker updates cursor once it
receives acknowledgment.
Consume
(BK)
Topic
Partition 0 Partition 1 Partition 2
Broker 0 Broker 1 Broker 2
Bookie 0 Bookie 1 Bookie 2
Consumer (P012)
1. The consumer subscribes to a
topic. It connects to the owner
brokers serving the partitions.
2. Broker does not have the data in
the memory and will read from one
of the Bookies that have the data.
3. Consumer acknowledges a
message after processing it.
Broker updates cursor once it
receives acknowledgment.
Failures Producer 0 Producer 1
Topic
Partition 0 Partition 1 Partition 2
Broker 0 Broker 1 Broker 2
Bookie 0 Bookie 1 Bookie 2
In flights messages will be
automatically retried by
Pulsar clients
Brokers are stateless. Any
broker process that dies that
doesn’t impact data storage.
Consumer (P012)
When a bookie dies, all the data
is still accessible and will be
replicated by other replicas
Message Retention
Message retention
✓ Retention
✓ TTL
✓ Message backlog
✓ Storage size
Subscription & Cursor
Partition (Event Stream)
Subscription A
(1, 1)
Subscription B
(2, 2)
Subscription C
(3, 2)
Subscription Initial Position
Partition (Event Stream)
Earliest
Partition (Event Stream)
SubscriptionInitialPosition
Earliest
Latest
Partition (Event Stream)
SubscriptionInitialPosition
Latest
Seek
Partition (Event Stream)
Subscription
(x, y)
Unsubscribe
Partition (Event Stream)
Subscription
(x, y)
Message retention (1)
Partition (Event Stream)
Subscription B
(2, 2)
Subscription C
(3, 2)
Message retention (2)
Partition (Event Stream)
Subscription B
(2, 2)
Subscription C
(3, 2)
NOT OK to deleteOK to delete
Message retention (3)
Partition (Event Stream)
Subscription B
(2, 2)
Subscription C
(3, 2)
NOT OK to deleteOK to delete Message Retention
Message retention (4)
Partition (Event Stream)
Subscription B
(2, 2)
Subscription C
(3, 2)
Yet to be processedOK to delete Message Retention
Message retention (5)
Acked
Msg 1
Acked
Msg 2
Acked
Msg 3
Acked
Msg 4
Acked
Msg 5
Acked
Msg 6
Acked
Msg 7
Acked
Msg 8
Unacked
Msg 9
Unacked
Msg 10
Unacked
Msg 11
Deleted Retention Yet to be processed
Message expiry (1)
Acked
Msg 5
Acked
Msg 6
Acked
Msg 7
Acked
Msg 8
Unacked
Msg 9
Unacked
Msg 10
Unacked
Msg 11
Deleted Retention
Not within TTL
(may still be processed)
Unacked
Msg 12
Unacked
Msg 13
Unacked
Msg 14
Unacked
Msg 15
Within the applied TTL
Message expiry (2)
Acked
Msg 5
Acked
Msg 6
Acked
Msg 7
Acked
Msg 8
Acked
Msg 9
Acked
Msg 10
Acked
Msg 11
Deleted Retention
Not within TTL
(may still be processed)
Acked
Msg 12
Unacked
Msg 13
Unacked
Msg 14
Unacked
Msg 15
Backlog (1)
Acked
Msg 5
Acked
Msg 6
Acked
Msg 7
Acked
Msg 8
Unacked
Msg 9
Unacked
Msg 10
Unacked
Msg 11
Deleted Retention
Unacked
Msg 12
Unacked
Msg 13
Unacked
Msg 14
Unacked
Msg 15
Yet to be processed
Backlog (2)
Acked
Msg 5
Acked
Msg 6
Acked
Msg 7
Acked
Msg 8
Unacked
Msg 9
Unacked
Msg 10
Unacked
Msg 11
Deleted Retention
Unacked
Msg 12
Unacked
Msg 13
Unacked
Msg 14
Unacked
Msg 15
Yet to be processed
SUB 0 SUB 2
Backlog
Message deletion (1)
Acked
Msg 5
Acked
Msg 6
Acked
Msg 7
Acked
Msg 8
Acked
Msg 9
Acked
Msg 10
Acked
Msg 11
Deleted Retention
Acked
Msg 12
Acked
Msg 13
Acked
Msg 14
Acked
Msg 15
Message deletion (2)
Acked
Msg 5
Acked
Msg 6
Acked
Msg 7
Acked
Msg 8
Acked
Msg 9
Acked
Msg 10
Acked
Msg 11
Deleted Retention
Acked
Msg 12
Acked
Msg 13
Acked
Msg 14
Acked
Msg 15
Message deletion (3)
Acked
Msg 5
Acked
Msg 6
Acked
Msg 7
Acked
Msg 8
Acked
Msg 9
Acked
Msg 10
Acked
Msg 11
Deleted Retention
Acked
Msg 12
Acked
Msg 13
Acked
Msg 14
Acked
Msg 15
S1 S2 S3 S4
Message deletion (4)
Acked
Msg 5
Acked
Msg 6
Acked
Msg 7
Acked
Msg 8
Acked
Msg 9
Acked
Msg 10
Acked
Msg 11
Deleted Retention
Acked
Msg 12
Acked
Msg 13
Acked
Msg 14
Acked
Msg 15
S1 S2 S3 S4
Message deletion (5)
Acked
Msg 5
Acked
Msg 6
Acked
Msg 7
Acked
Msg 8
Acked
Msg 9
Acked
Msg 10
Acked
Msg 11
Deleted Retention
Acked
Msg 12
Acked
Msg 13
Acked
Msg 14
Acked
Msg 15
S1 S2 S3 S4
Storage size
✓ All the storage occupied by the “undeleted”
segments, in bytes
Message deletion
✓ Messages are deleted segment by segment
✓ The disk space of a segment is reclaimed by a
garbage collector thread after it is deleted
✓ The garbage collector is running periodically
○ gcWaitTime
Retention settings
✓ Retention (broker / namespace)
○ defaultRetentionTimeInMinutes
○ defaultRetentionSizeInMB
✓ TTL (broker / namespace)
○ ttlDurationDefaultInSeconds
Trigger retention
✓ Ledger Rollover
○ managedLedgerMinLedgerRolloverTimeMinutes
○ managedLedgerMaxLedgerRolloverTimeMinutes
○ managedLedgerMaxEntriesPerLedger
Garbage collection settings
✓ Bookie settings
○ gcWaitTime
○ majorCompactionThreshold
○ majorCompactionInterval
○ minorCompactionThreshold
○ minorCompactionInterval

More Related Content

TGIPulsar - EP #006: Lifecycle of a Pulsar message

  • 2. streamnative.io Lifecycle of a Pulsar Message #TGIPulsar EP-006
  • 3. Message Lifecycle ✓ Message Flow ✓ Message Retention
  • 5. Brokers + Bookies Bookie 0 Bookie 1 Bookie 2 The processes for storing data are called bookies. They persist data for Pulsar. Broker 0 Broker 1 Broker 2 Brokers are “stateless”. They serve clients for producing and consuming events
  • 6. ZooKeeper Bookie 0 Bookie 1 Bookie 2 The processes for storing data are called bookies. They persist data for Pulsar. Broker 0 Broker 1 Broker 2 Brokers are “stateless”. They serve clients for producing and consuming events ZooKeeper ZooKeeper ZooKeeper ZooKeeper is used for storing the metadata for Pulsar and bookkeeper as well as for discovering brokers and bookies.
  • 7. Pulsar Producer 0 Producer 1 Topic Partition 0 Partition 1 Partition 2 Broker X Broker Y Broker Z Subscription A Consumer (P012)
  • 8. Produce Producer 0 Producer 1 Topic Partition 0 Partition 1 Partition 2 Broker 0 Broker 1 Broker 2 Bookie 0 Bookie 1 Bookie 2 1. A message is created and a partition is selected 2. The message is sent to the owner broker that serves the selected partition 3. The message is written to N bookies in parallel by the owner broker. The message is written once and stored in their entirety. 4. Once the message has been written by 2 bookies, the broker will acknowledge the message
  • 9. Consume (Cached) Topic Partition 0 Partition 1 Partition 2 Broker 0 Broker 1 Broker 2 Bookie 0 Bookie 1 Bookie 2 Consumer (P012) 1. The consumer subscribes to a topic. It connects to the owner brokers serving the partitions. 2. Broker sends messages for the partition coming out of its memory cache 3. Consumer acknowledges a message after processing it. Broker updates cursor once it receives acknowledgment.
  • 10. Consume (BK) Topic Partition 0 Partition 1 Partition 2 Broker 0 Broker 1 Broker 2 Bookie 0 Bookie 1 Bookie 2 Consumer (P012) 1. The consumer subscribes to a topic. It connects to the owner brokers serving the partitions. 2. Broker does not have the data in the memory and will read from one of the Bookies that have the data. 3. Consumer acknowledges a message after processing it. Broker updates cursor once it receives acknowledgment.
  • 11. Failures Producer 0 Producer 1 Topic Partition 0 Partition 1 Partition 2 Broker 0 Broker 1 Broker 2 Bookie 0 Bookie 1 Bookie 2 In flights messages will be automatically retried by Pulsar clients Brokers are stateless. Any broker process that dies that doesn’t impact data storage. Consumer (P012) When a bookie dies, all the data is still accessible and will be replicated by other replicas
  • 13. Message retention ✓ Retention ✓ TTL ✓ Message backlog ✓ Storage size
  • 14. Subscription & Cursor Partition (Event Stream) Subscription A (1, 1) Subscription B (2, 2) Subscription C (3, 2)
  • 20. Message retention (1) Partition (Event Stream) Subscription B (2, 2) Subscription C (3, 2)
  • 21. Message retention (2) Partition (Event Stream) Subscription B (2, 2) Subscription C (3, 2) NOT OK to deleteOK to delete
  • 22. Message retention (3) Partition (Event Stream) Subscription B (2, 2) Subscription C (3, 2) NOT OK to deleteOK to delete Message Retention
  • 23. Message retention (4) Partition (Event Stream) Subscription B (2, 2) Subscription C (3, 2) Yet to be processedOK to delete Message Retention
  • 24. Message retention (5) Acked Msg 1 Acked Msg 2 Acked Msg 3 Acked Msg 4 Acked Msg 5 Acked Msg 6 Acked Msg 7 Acked Msg 8 Unacked Msg 9 Unacked Msg 10 Unacked Msg 11 Deleted Retention Yet to be processed
  • 25. Message expiry (1) Acked Msg 5 Acked Msg 6 Acked Msg 7 Acked Msg 8 Unacked Msg 9 Unacked Msg 10 Unacked Msg 11 Deleted Retention Not within TTL (may still be processed) Unacked Msg 12 Unacked Msg 13 Unacked Msg 14 Unacked Msg 15 Within the applied TTL
  • 26. Message expiry (2) Acked Msg 5 Acked Msg 6 Acked Msg 7 Acked Msg 8 Acked Msg 9 Acked Msg 10 Acked Msg 11 Deleted Retention Not within TTL (may still be processed) Acked Msg 12 Unacked Msg 13 Unacked Msg 14 Unacked Msg 15
  • 27. Backlog (1) Acked Msg 5 Acked Msg 6 Acked Msg 7 Acked Msg 8 Unacked Msg 9 Unacked Msg 10 Unacked Msg 11 Deleted Retention Unacked Msg 12 Unacked Msg 13 Unacked Msg 14 Unacked Msg 15 Yet to be processed
  • 28. Backlog (2) Acked Msg 5 Acked Msg 6 Acked Msg 7 Acked Msg 8 Unacked Msg 9 Unacked Msg 10 Unacked Msg 11 Deleted Retention Unacked Msg 12 Unacked Msg 13 Unacked Msg 14 Unacked Msg 15 Yet to be processed SUB 0 SUB 2 Backlog
  • 29. Message deletion (1) Acked Msg 5 Acked Msg 6 Acked Msg 7 Acked Msg 8 Acked Msg 9 Acked Msg 10 Acked Msg 11 Deleted Retention Acked Msg 12 Acked Msg 13 Acked Msg 14 Acked Msg 15
  • 30. Message deletion (2) Acked Msg 5 Acked Msg 6 Acked Msg 7 Acked Msg 8 Acked Msg 9 Acked Msg 10 Acked Msg 11 Deleted Retention Acked Msg 12 Acked Msg 13 Acked Msg 14 Acked Msg 15
  • 31. Message deletion (3) Acked Msg 5 Acked Msg 6 Acked Msg 7 Acked Msg 8 Acked Msg 9 Acked Msg 10 Acked Msg 11 Deleted Retention Acked Msg 12 Acked Msg 13 Acked Msg 14 Acked Msg 15 S1 S2 S3 S4
  • 32. Message deletion (4) Acked Msg 5 Acked Msg 6 Acked Msg 7 Acked Msg 8 Acked Msg 9 Acked Msg 10 Acked Msg 11 Deleted Retention Acked Msg 12 Acked Msg 13 Acked Msg 14 Acked Msg 15 S1 S2 S3 S4
  • 33. Message deletion (5) Acked Msg 5 Acked Msg 6 Acked Msg 7 Acked Msg 8 Acked Msg 9 Acked Msg 10 Acked Msg 11 Deleted Retention Acked Msg 12 Acked Msg 13 Acked Msg 14 Acked Msg 15 S1 S2 S3 S4
  • 34. Storage size ✓ All the storage occupied by the “undeleted” segments, in bytes
  • 35. Message deletion ✓ Messages are deleted segment by segment ✓ The disk space of a segment is reclaimed by a garbage collector thread after it is deleted ✓ The garbage collector is running periodically ○ gcWaitTime
  • 36. Retention settings ✓ Retention (broker / namespace) ○ defaultRetentionTimeInMinutes ○ defaultRetentionSizeInMB ✓ TTL (broker / namespace) ○ ttlDurationDefaultInSeconds
  • 37. Trigger retention ✓ Ledger Rollover ○ managedLedgerMinLedgerRolloverTimeMinutes ○ managedLedgerMaxLedgerRolloverTimeMinutes ○ managedLedgerMaxEntriesPerLedger
  • 38. Garbage collection settings ✓ Bookie settings ○ gcWaitTime ○ majorCompactionThreshold ○ majorCompactionInterval ○ minorCompactionThreshold ○ minorCompactionInterval