Preview of Apache Pulsar 2.5.0
- 1. Preview of Apache Pulsar 2.5.0
Transactional streaming
Sticky consumer
Batch receiving
Namespace change events
- 2. Messaging semantics - 1
1. At least once
try {
Message msg = consumer.receive()
// processing
consumer.acknowledge(msg)
} catch (Exception e) {
consumer.negativeAcknowledge(msg)
}
try {
Message msg = consumer.receive()
// processing
} catch (Exception e) {
log.error(“processing error”, e)
} finally {
consumer.acknowledge(msg)
}
2. At most once
3. Exactly once ?
- 5. Messaging semantics - 4
Limitations in effectively once
1. Only works with one partition producing
2. Only works with one message producing
3. Only works with on partition consuming
4. Consumers are required to store the message id and state for restoring
- 6. Streaming processing - 1
ATopic-1 Topic-2f (A) B
1
1. Received message A from Topic-1 and do some processing
- 8. Streaming processing - 3
ATopic-1 Topic-2f (A) B
3
3. Get send response from Topic-2
How to handle get response timeout or consumer/function crash?
Ack message A = At most once
Nack message A = At least once
- 9. Streaming processing - 4
ATopic-1 Topic-2f (A) B4
4. Ack message A
How to handle ack failed or consumer/function crash?
- 10. Transactional streaming semantics
1. Atomic multi-topic publish and acknowledge
2.Message only dispatch to one consumer until transaction abort
3.Only committed message can be read by consumer
READ_COMMITTED
https://github.com/apache/pulsar/wiki/PIP-31%3A-Transaction-Support
- 11. Transactional streaming demo
Message<String> message = inputConsumer.receive();
Transaction txn =
client.newTransaction().withTransactionTimeout(…).build().get();
CompletableFuture<MessageId> sendFuture1 =
producer1.newMessage(txn).value(“output-message-1”).sendAsync();
CompletableFuture<MessageId> sendFuture2 =
producer2.newMessage(txn).value(“output-message-2”).sendAsync();
inputConsumer.acknowledgeAsync(message.getMessageId(), txn);
txn.commit().get();
MessageId msgId1 = sendFuture1.get();
MessageId msgId2 = sendFuture2.get();
- 14. Batch receiving messages
Consumer consumer = client.newConsumer()
.topic(“my-topic“)
.subscription(“my-subscription”)
.batchReceivePolicy(BatchReceivePolicy.builder()
.maxNumMessages(100)
.maxNumBytes(2 * 1024 * 1024)
.timeout(1, TimeUnit.SECONDS)
).subscribe();
Messages msgs = consumer.batchReceive();
// doing some batch operate
https://github.com/apache/pulsar/wiki/PIP-38%3A-Batch-Receiving-Messages
- 17. Bo Cong / 丛搏
Pulsar Schema
智联招聘消息系统研发⼯程师
Pulsar schema、HDFS Offload 核⼼贡献者
- 22. Compatibility strategy evolution
Back Ward
Back Ward Transitive
version 2 version 1 version 0
version 2 version 1 version 0
can read can read
can read can read
can read
may can’t read
- 23. Evolution of the situation
7
Class Person {
@Nullable
String name;
}
Version 1
Class Person {
String name;
}
Class Person {
@Nullable
@AvroDefault(""Zhang San"")
String name;
} Version 2
Version 3
Can read
Can readCan’t read
- 26. Produce Different Message
10
Producer<V1Data> p = pulsarClient.newProducer(Schema.AVRO(V1Data.class))
.topic(topic).create();
Consumer<V2Data> c = pulsarClient.newConsumer(Schema.AVRO(V2Data.class))
.topic(topic)
.subscriptionName("sub1").subscribe()
p.newMessage().value(data1).send();
p.newMessage(Schema.AVRO(V2Data.class)).value(data2).send();
p.newMessage(Schema.AVRO(V1Data.class)).value(data3).send();
Message<V2Data> msg1 = c.receive();
V2Data msg1Value = msg1.getValue();
Message<V2Data> msg2 = c.receive();
Message<V2Data> msg3 = c.receive();
V2Data msg3Value = msg3.getValue();
- 29. What is Apache Pulsar?
Flexible Pub/Sub
Messaging
backed by Durable
log Storage
- 31. How Pulsar handles it?
Pulsar Kafka Wrapper on Kafka Java API
https://pulsar.apache.org/docs/en/adaptors-kafka/
Pulsar IO Connect
https://pulsar.apache.org/docs/en/io-overview/
- 37. KoP Feasibility — Others
Producer Consumer
Topic Lookup
Produce
Consume
Offset
Consumption State
- 38. KoP Overview
Kafka lib
Broker
Pulsar
Consumer
Pulsar lib
Load
Balancer
Pulsar Protocol handler Kafka Protocol handler
Pulsar
Producer
Pulsar lib
Kafka
Producer
Kafka lib
Kafka
Consumer
Kafka lib
Kafka
Producer
Managed Ledger
BK Client
Geo-
Replicator
Pulsar Topic
ZooKeeper
Bookie
Pulsar
- 39. KoP Implementation
Topic flat map: Broker sets `kafkaNamespace`
Message ID and Offset: LedgerId + EntryId
Message: Convert Key/value/timestamp/headers(properties)
Topic Lookup: Pulsar admin topic lookup -> owner broker
Produce: Convert, then call PulsarTopic.publishMessage
Consume: Convert, then call non-durable-cursor.readEntries
Group Coordinator: Keep in topic `public/__kafka/__offsets`
- 41. Ordering
Guaranteed ordering
Multi-tenancy
A single cluster can
support many tenants
and use cases
High throughput
Can reach 1.8 M
messages/s in a
single partition
Durability
Data replicated and
synced to disk
Geo-replication
Out of box support for
geographically
distributed
applications
Unified messaging
model
Support both
Streaming and
Queuing
Delivery Guarantees
At least once, at most
once and effectively once
Low Latency
Low publish latency of
5ms
Highly scalable &
available
Can support millions of
topics
HA
KoP Now
- 44. Demo1: K-Producer -> K-Consumer
Kafka lib
Kafka
Consumer
Kafka libKafka lib
Kafka
Producer
Broker
Pulsar Protocol handler Kafka Protocol handler
Pulsar Topic
bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test
bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test --from-beginning
- 46. Demo1: P-Producer -> K-Consumer
Pulsar
Consumer
Pulsar lib
Pulsar
Producer
Pulsar lib
Kafka lib
Kafka
Consumer
Kafka libKafka lib
Kafka
Producer
Broker
Pulsar Protocol handler Kafka Protocol handler
Pulsar Topic
bin/pulsar-client produce test -n 1 -m “Hello from Pulsar Producer, Message 1”
bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test --from-beginning
- 48. Demo1: P-Producer -> K-Consumer
Pulsar
Consumer
Pulsar lib
Pulsar
Producer
Pulsar lib
Kafka lib
Kafka
Consumer
Kafka libKafka lib
Kafka
Producer
Broker
Pulsar Protocol handler Kafka Protocol handler
Pulsar Topic
bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test
bin/pulsar-client consume -s sub-name test -n 0
- 51. Demo2: Kafka Connect
Kafka lib
Kafka
File
Source
Broker
Pulsar Protocol handler Kafka Protocol handler
Pulsar Topic
InPut
File
Kafka
File
Sink
OutPut
File
TOPIC
bin/connect-standalone.sh
config/connect-standalone.properties
config/connect-file-source.properties
config/connect-file-sink.properties
- 53. Demo2: Pulsar Functions
Kafka lib
Kafka
File
Source
Broker
Pulsar Protocol handler Kafka Protocol handler
Pulsar Topic
InPut
File
Kafka
File
Sink
OutPut
File
TOPIC
Kafka lib
Pulsar
Functions
OutPut Topic
bin/pulsar-admin functions localrun --name pulsarExclamation
--jar pulsar-functions-api-examples.jar
--classname org…ExclamationFunction
--inputs connect-test-partition-0 --output out-hello