Transaction preview of Apache Pulsar
- 6. 2012: Pulsar idea started at Yahoo!
5 years on production, 100+ applications, 10+ data centers
2016/09 Yahoo open sourced Pulsar
2017/06 Yahoo donated Pulsar to ASF
2018/09 Pulsar graduated as a Top-Level project
2018/09 InfoWorld Best Open Source Project
- 10. • At-most once
• At-least once
• Exactly once
Messaging Semantics
Before 1.20.0-incubating
- 11. • At-most once
• At-least once
• Exactly once
Messaging Semantics
PIP-6: Guaranteed Message Deduplication
- 25. • Broker can fail
• The request from Producer to Broker can fail
• Producer or Consumer can fail
Why the duplicates are introduced?
- 27. • Producer: Idempotent Producer
• Broker: Guaranteed Message Deduplication (PIP-6)
• Consumer: Reader + Checkpoints (Flink / Spark)
Message Deduplication
- 28. • Producer Name - Identify who is producing the messages
• Sequence ID - Identify the message
• Producer Name + Sequence ID: The unique identifier for a
message
Idempotent Producer
- 29. • Broker maintains a map between Producer Name and Last-
Produced-Sequence-ID
• Broker accepts messages if the sequence id of a new
message is larger than its last produced sequence id
• Broker treats messages whose sequence id are smaller
• Broker keeps the map in a de-duplication cursor (stored in
bookkeeper)
Guaranteed Message Deduplication
- 39. • It only works when producing messages to one partition
• It only works for producing one message
• There is no atomicity when producing multiple messages to
one partition or many partitions
• Consumers are required to store the MessageId along with
its state and seek back to the MessageId when restoring
the state
Limitations
- 43. • Transfer Topic : record the transfer requests
• Cash Transfer Function: perform the cash transfer action
• BalanceUpdate Topic: record the balance-update requests
PulsarCash, powered by Apache Pulsar
- 49. • Atomic writes across multiple partitions
• Atomic acknowledges across multiple subscriptions
• All the actions made within one transaction either all
succeed or all fail
• Consumers are *ONLY* allowed to read committed
messages
Transaction Semantics
- 50. Message<String> message = inputConsumer.receive();
CompletableFuture<MessageId> sendFuture1 =
producer1.newMessage().value(“output-message-1”).sendAsync();
CompletableFuture<MessageId> sendFuture2 =
producer2.newMessage().value(“output-message-2”).sendAsync();
inputConsumer.acknowledgeAsync(message.getMessageId());
Without Transaction API
- 52. Message<String> message = inputConsumer.receive();
Transaction txn = client.newTransaction().withTransactionTimeout(…).build().get();
CompletableFuture<MessageId> sendFuture1 =
producer1.newMessage(txn).value(“output-message-1”).sendAsync();
CompletableFuture<MessageId> sendFuture2 =
producer2.newMessage(txn).value(“output-message-2”).sendAsync();
inputConsumer.acknowledgeAsync(message.getMessageId(), txn);
txn.commit().get();
MessageId msgId1 = sendFuture1.get();
MessageId msgId2 = sendFuture2.get();
Transaction API
- 55. • TC: transaction manager, coordinating committing and
aborting transactions
• In-Memory + Transaction Log
• Transaction Log is powered by a partitioned Pulsar topic
• `pulsar/system/__transaction_coordinator_log`
• Locating a TC is locating a partition of the transaction log
topic
Transaction Coordinator (TC)
- 57. • TB: store and index transaction data per topic partition
• TB is implemented using another ML (managed-ledger) as
TB log
• Messages are appended to into TB log
• Transaction Index is maintained in memory and
snapshotted to ledgers
• Transaction Index can be replayed from TB log
Transaction Buffer (TB)
- 59. • Introduce ACK_PENDING state
• Add response for acknowledgement, aka ack-on-ack
• Ack state is updated to cursor ledger
• Ack state can be replayed from cursor ledger
Transactional Subscription State
- 61. Message<String> message = inputConsumer.receive();
Transaction txn = client.newTransaction().withTransactionTimeout(…).build().get();
CompletableFuture<MessageId> sendFuture1 =
producer1.newMessage(txn).value(“output-message-1”).sendAsync();
CompletableFuture<MessageId> sendFuture2 =
producer2.newMessage(txn).value(“output-message-2”).sendAsync();
inputConsumer.acknowledgeAsync(message.getMessageId(), txn);
txn.commit().get();
MessageId msgId1 = sendFuture1.get();
MessageId msgId2 = sendFuture2.get();
Transaction API - New Transaction
- 63. Message<String> message = inputConsumer.receive();
Transaction txn = client.newTransaction().withTransactionTimeout(…).build().get();
CompletableFuture<MessageId> sendFuture1 =
producer1.newMessage(txn).value(“output-message-1”).sendAsync();
CompletableFuture<MessageId> sendFuture2 =
producer2.newMessage(txn).value(“output-message-2”).sendAsync();
inputConsumer.acknowledgeAsync(message.getMessageId(), txn);
txn.commit().get();
MessageId msgId1 = sendFuture1.get();
MessageId msgId2 = sendFuture2.get();
Transaction API - Produce Messages
- 64. CoordinatorBroker-0 Broker-1
InputTopic OutputTopic-1 OutputTopic-2
Cursor
Transaction Log
Data Log
Txn Buffer
Data Log
Txn Buffer
Pulsar Client
Input
Consumer
Producer 1 Producer 2
Txn
New Txn
2.0 Add Produced Topics
To Txn
Tx1
Tx1: add [T1, T2] Tx1: M1 Tx1: M2
2.1 Produced Messages
To Topics with Txn
- 65. Message<String> message = inputConsumer.receive();
Transaction txn = client.newTransaction().withTransactionTimeout(…).build().get();
CompletableFuture<MessageId> sendFuture1 =
producer1.newMessage(txn).value(“output-message-1”).sendAsync();
CompletableFuture<MessageId> sendFuture2 =
producer2.newMessage(txn).value(“output-message-2”).sendAsync();
inputConsumer.acknowledgeAsync(message.getMessageId(), txn);
txn.commit().get();
MessageId msgId1 = sendFuture1.get();
MessageId msgId2 = sendFuture2.get();
Transaction API - Acknowledges
- 66. CoordinatorBroker-0 Broker-1
InputTopic OutputTopic-1 OutputTopic-2
Cursor
Transaction Log
Data Log
Txn Buffer
Data Log
Txn Buffer
Pulsar Client
Input
Consumer
Producer 1 Producer 2
Txn
New Txn
3.0 Add Acked Subscriptions To Txn
Tx1
Tx1: add [T1, T2] Tx1: M1 Tx1: M2
3.0 Ack messages with Txn
Tx1: ACK (M0)
Tx1: add [S0]
- 67. Message<String> message = inputConsumer.receive();
Transaction txn = client.newTransaction().withTransactionTimeout(…).build().get();
CompletableFuture<MessageId> sendFuture1 =
producer1.newMessage(txn).value(“output-message-1”).sendAsync();
CompletableFuture<MessageId> sendFuture2 =
producer2.newMessage(txn).value(“output-message-2”).sendAsync();
inputConsumer.acknowledgeAsync(message.getMessageId(), txn);
txn.commit().get();
MessageId msgId1 = sendFuture1.get();
MessageId msgId2 = sendFuture2.get();
Transaction API - Commit
- 68. CoordinatorBroker-0 Broker-1
InputTopic OutputTopic-1 OutputTopic-2
Cursor
Transaction Log
Data Log
Txn Buffer
Data Log
Txn Buffer
Pulsar Client
Input
Consumer
Producer 1 Producer 2
Txn
New Txn
4.0 Commit Txn
Tx1
Tx1: add [T1, T2] Tx1: M1 Tx1: M2
Tx1: ACK (M0)
Tx1: add [S0]
4.0 Committing Txn
Tx1: Committing
- 69. CoordinatorBroker-0 Broker-1
InputTopic OutputTopic-1 OutputTopic-2
Cursor
Transaction Log
Data Log
Txn Buffer
Data Log
Txn Buffer
Pulsar Client
Input
Consumer
Producer 1 Producer 2
Txn
New Txn
Tx1
Tx1: add [T1, T2] Tx1: M1 Tx1: M2
Tx1: ACK (M0)
Tx1: add [S0]
4.1.0 Commit Txn
On Topics
4.1.1 Commit Txn
On Subscriptions
Tx1 (c) Tx1 (c)
Tx1: Committing
Tx1: Committed
Tx1: Committed
- 70. CoordinatorBroker-0 Broker-1
InputTopic OutputTopic-1 OutputTopic-2
Cursor
Transaction Log
Data Log
Txn Buffer
Data Log
Txn Buffer
Pulsar Client
Input
Consumer
Producer 1 Producer 2
Txn
New Txn
Tx1
Tx1: add [T1, T2] Tx1: M1 Tx1: M2
Tx1: ACK (M0)
Tx1: add [S0]
Tx1: Committing
Tx1 (c) Tx1 (c)
Tx1: Committed
Tx1: Committed
4.2 Committed Txn
- 71. inputConsumer.receiveAsync().thenCompose(message -> {
return client.newTransaction().withTransactionTimeout(…).build().thenCompose(txn -> {
producer1.newMessage(txn).value(“output-message-1”).sendAsync();
producer2.newMessage(txn).value(“output-message-2”).sendAsync();
inputConsumer.acknowledgeAsync(message.getMessageId(), txn);
return txn.commit();
});
})
Transaction API - Async Example
- 76. • Transaction support in other languages (e.g. C++, Go)
• Transaction in Pulsar Functions & Pulsar IO
• Transaction in Kafka-on-Pulsar (KOP)
• Transaction for Flink / Spark job
• Transaction for State storage in Pulsar Functions
• …
Roadmap
- 77. • Ivan Kelly
• Matteo Merli
• Jia Zhai
• Penghui Li
• Marvin Cai
• Yong Zhang
• … and many other Pulsar users & contributors
Credits
- 78. Wechat Subscription: ApachePulsar
Mailing Lists
dev@pulsar.apache.org, users@pulsar.apache.org
Slack
https://apache-pulsar.slack.com (#china)
register: https://apache-pulsar.herokuapp.com/
https://github.com/apache/pulsar
https://github.com/apache/bookkeeper