Transaction Support in Pulsar 2.5.0
- 1. Sijie Guo
Apache Pulsar / BookKeeper PMC Member
Transaction Support in Pulsar
Yong Zhang
Apache Pulsar Contributor
- 3. • At-most once
• At-least once
• Exactly once
Messaging Semantics
Before 1.20.0-incubating
- 4. • At-most once
• At-least once
• Exactly once
Messaging Semantics
PIP-6: Guaranteed Message Deduplication
- 18. • Broker can fail
• The request from Producer to Broker can fail
• Producer or Consumer can fail
Why the duplicates are introduced?
- 20. • Producer: Idempotent Producer
• Broker: Guaranteed Message Deduplication (PIP-6)
• Consumer: Reader + Checkpoints (Flink / Spark)
Message Deduplication
- 21. • Producer Name - Identify who is producing the messages
• Sequence ID - Identify the message
• Producer Name + Sequence ID: The unique identifier for a
message
Idempotent Producer
- 22. • Broker maintains a map between Producer Name and Last-
Produced-Sequence-ID
• Broker accepts messages if the sequence id of a new
message is larger than its last produced sequence id
• Broker treats messages whose sequence id are smaller
• Broker keeps the map in a de-duplication cursor (stored in
bookkeeper)
Guaranteed Message Deduplication
- 32. • It only works when producing messages to one partition
• It only works for producing one message
• There is no atomicity when producing multiple messages to
one partition or many partitions
• Consumers are required to store the MessageId along with
its state and seek back to the MessageId when restoring
the state
Limitations
- 36. • Transfer Topic : record the transfer requests
• Cash Transfer Function: perform the cash transfer action
• BalanceUpdate Topic: record the balance-update requests
PulsarCash, powered by Apache Pulsar
- 42. • Atomic writes across multiple partitions
• Atomic acknowledges across multiple subscriptions
• All the actions made within one transaction either all
succeed or all fail
• Consumers are *ONLY* allowed to read committed
messages
Transaction Semantics
- 43. Message<String> message = inputConsumer.receive();
CompletableFuture<MessageId> sendFuture1 =
producer1.newMessage().value(“output-message-1”).sendAsync();
CompletableFuture<MessageId> sendFuture2 =
producer2.newMessage().value(“output-message-2”).sendAsync();
inputConsumer.acknowledgeAsync(message.getMessageId());
Without Transaction API
- 45. Message<String> message = inputConsumer.receive();
Transaction txn = client.newTransaction().withTransactionTimeout(…).build().get();
CompletableFuture<MessageId> sendFuture1 =
producer1.newMessage(txn).value(“output-message-1”).sendAsync();
CompletableFuture<MessageId> sendFuture2 =
producer2.newMessage(txn).value(“output-message-2”).sendAsync();
inputConsumer.acknowledgeAsync(message.getMessageId(), txn);
txn.commit().get();
MessageId msgId1 = sendFuture1.get();
MessageId msgId2 = sendFuture2.get();
Transaction API
- 48. • TC: transaction manager, coordinating committing and
aborting transactions
• In-Memory + Transaction Log
• Transaction Log is powered by a partitioned Pulsar topic
• `pulsar/system/__transaction_coordinator_log`
• Locating a TC is locating a partition of the transaction log
topic
Transaction Coordinator (TC)
- 50. • TB: store and index transaction data per topic partition
• TB is implemented using another ML (managed-ledger) as
TB log
• Messages are appended to into TB log
• Transaction Index is maintained in memory and
snapshotted to ledgers
• Transaction Index can be replayed from TB log
Transaction Buffer (TB)
- 52. • Introduce ACK_PENDING state
• Add response for acknowledgement, aka ack-on-ack
• Ack state is updated to cursor ledger
• Ack state can be replayed from cursor ledger
Transactional Subscription State
- 54. Message<String> message = inputConsumer.receive();
Transaction txn = client.newTransaction().withTransactionTimeout(…).build().get();
CompletableFuture<MessageId> sendFuture1 =
producer1.newMessage(txn).value(“output-message-1”).sendAsync();
CompletableFuture<MessageId> sendFuture2 =
producer2.newMessage(txn).value(“output-message-2”).sendAsync();
inputConsumer.acknowledgeAsync(message.getMessageId(), txn);
txn.commit().get();
MessageId msgId1 = sendFuture1.get();
MessageId msgId2 = sendFuture2.get();
Transaction API - New Transaction
- 56. Message<String> message = inputConsumer.receive();
Transaction txn = client.newTransaction().withTransactionTimeout(…).build().get();
CompletableFuture<MessageId> sendFuture1 =
producer1.newMessage(txn).value(“output-message-1”).sendAsync();
CompletableFuture<MessageId> sendFuture2 =
producer2.newMessage(txn).value(“output-message-2”).sendAsync();
inputConsumer.acknowledgeAsync(message.getMessageId(), txn);
txn.commit().get();
MessageId msgId1 = sendFuture1.get();
MessageId msgId2 = sendFuture2.get();
Transaction API - Produce Messages
- 57. CoordinatorBroker-0 Broker-1
InputTopic OutputTopic-1 OutputTopic-2
Cursor
Transaction Log
Data Log
Txn Buffer
Data Log
Txn Buffer
Pulsar Client
Input
Consumer
Producer 1 Producer 2
Txn
New Txn
2.0 Add Produced Topics
To Txn
Tx1
Tx1: add [T1, T2] Tx1: M1 Tx1: M2
2.1 Produced Messages
To Topics with Txn
- 58. Message<String> message = inputConsumer.receive();
Transaction txn = client.newTransaction().withTransactionTimeout(…).build().get();
CompletableFuture<MessageId> sendFuture1 =
producer1.newMessage(txn).value(“output-message-1”).sendAsync();
CompletableFuture<MessageId> sendFuture2 =
producer2.newMessage(txn).value(“output-message-2”).sendAsync();
inputConsumer.acknowledgeAsync(message.getMessageId(), txn);
txn.commit().get();
MessageId msgId1 = sendFuture1.get();
MessageId msgId2 = sendFuture2.get();
Transaction API - Acknowledges
- 59. CoordinatorBroker-0 Broker-1
InputTopic OutputTopic-1 OutputTopic-2
Cursor
Transaction Log
Data Log
Txn Buffer
Data Log
Txn Buffer
Pulsar Client
Input
Consumer
Producer 1 Producer 2
Txn
New Txn
3.0 Add Acked Subscriptions To Txn
Tx1
Tx1: add [T1, T2] Tx1: M1 Tx1: M2
3.0 Ack messages with Txn
Tx1: ACK (M0)
Tx1: add [S0]
- 60. Message<String> message = inputConsumer.receive();
Transaction txn = client.newTransaction().withTransactionTimeout(…).build().get();
CompletableFuture<MessageId> sendFuture1 =
producer1.newMessage(txn).value(“output-message-1”).sendAsync();
CompletableFuture<MessageId> sendFuture2 =
producer2.newMessage(txn).value(“output-message-2”).sendAsync();
inputConsumer.acknowledgeAsync(message.getMessageId(), txn);
txn.commit().get();
MessageId msgId1 = sendFuture1.get();
MessageId msgId2 = sendFuture2.get();
Transaction API - Commit
- 61. CoordinatorBroker-0 Broker-1
InputTopic OutputTopic-1 OutputTopic-2
Cursor
Transaction Log
Data Log
Txn Buffer
Data Log
Txn Buffer
Pulsar Client
Input
Consumer
Producer 1 Producer 2
Txn
New Txn
4.0 Commit Txn
Tx1
Tx1: add [T1, T2] Tx1: M1 Tx1: M2
Tx1: ACK (M0)
Tx1: add [S0]
4.0 Committing Txn
Tx1: Committing
- 62. CoordinatorBroker-0 Broker-1
InputTopic OutputTopic-1 OutputTopic-2
Cursor
Transaction Log
Data Log
Txn Buffer
Data Log
Txn Buffer
Pulsar Client
Input
Consumer
Producer 1 Producer 2
Txn
New Txn
Tx1
Tx1: add [T1, T2] Tx1: M1 Tx1: M2
Tx1: ACK (M0)
Tx1: add [S0]
4.1.0 Commit Txn
On Topics
4.1.1 Commit Txn
On Subscriptions
Tx1 (c) Tx1 (c)
Tx1: Committing
Tx1: Committed
Tx1: Committed
- 63. CoordinatorBroker-0 Broker-1
InputTopic OutputTopic-1 OutputTopic-2
Cursor
Transaction Log
Data Log
Txn Buffer
Data Log
Txn Buffer
Pulsar Client
Input
Consumer
Producer 1 Producer 2
Txn
New Txn
Tx1
Tx1: add [T1, T2] Tx1: M1 Tx1: M2
Tx1: ACK (M0)
Tx1: add [S0]
Tx1: Committing
Tx1 (c) Tx1 (c)
Tx1: Committed
Tx1: Committed
4.2 Committed Txn
- 64. inputConsumer.receiveAsync().thenCompose(message -> {
return client.newTransaction().withTransactionTimeout(…).build().thenCompose(txn -> {
producer1.newMessage(txn).value(“output-message-1”).sendAsync();
producer2.newMessage(txn).value(“output-message-2”).sendAsync();
inputConsumer.acknowledgeAsync(message.getMessageId(), txn);
return txn.commit();
});
})
Transaction API - Async Example
- 69. • Transaction support in other languages (e.g. C++, Go)
• Transaction in Pulsar Functions & Pulsar IO
• Transaction in Kafka-on-Pulsar (KOP)
• Transaction for Flink / Spark job
• Transaction for State storage in Pulsar Functions
• …
Roadmap
- 70. • Ivan Kelly
• Matteo Merli
• Jia Zhai
• Penghui Li
• Marvin Cai
• Yong Zhang
• … and many other Pulsar users & contributors
Credits