SlideShare a Scribd company logo
How Zhaopin contributes to Pulsar community
PHOTO
Zhaopin in Pulsar community
Penghui Li 李李鹏辉
Messaging platform leader in zhaopin.com
Apache Pulsar Committer
Our team
Penghui Li Bo Cong
Apache Pulsar in zhaopin.com
2018/08
First service for online
applications
2018/10
1 billion / day
2019/02
6 billion / day
2019/08
20 billion / day
50+ Namespaces
5000+ Topics
1. Features of zhaopin contributing to the community

2. Details of Key_shared subscription

3. Release Pulsar

4. Details of Pulsar multiple schema version

5. Details of HDFS Offloader
Dead letter topic
Topic Topic-DLQConsumer
… 2 1 0 1
message 1 process failed so many times
Client interceptors
TopicProducer Consumer
Send
Send Ack
Receive
Acknowledge
Time partitioned un-ack message tracker
p-4 p-3 p-2 p-1 p-0
Current partition Timeout partition
Add messages to tracker Send redelivery request
Message redelivery optimization
7 3 2 06 5 4 1
08
Consumer internal queue
ConsumerBroker
0 1 2 3
4 5 6 7
0
Key_shared subscription
A new subscription mode in 2.4.0
Producer 1
Producer 2
Pulsar
topic
<k1,v0>
<k2,v1>
<k3,v2>
<k2,v3>
<k1,v4>
Subscription
Consumer D-1
Consumer D-2
Consumer D-3
<k1,v0>
<k1,v4>
<k3,v2>
<k2,v1>
<k2,v3>
Start with Key_shared subscription
Consumer consumer = client.newConsumer()
.topic(“my-topic”)
.subscriptionName(“my-sub”)
.subscriptionType(SubscriptionType.Key_Shared)
.subscribe()
How Key_shared subscription works
Sticky key dispatcher(auto split hash range)
0 65536
Consumer-1
0 1 2 3 4 5 6 7 8 9
How Key_shared subscription works
Sticky key dispatcher(auto split hash range)
0 65536
Consumer-1
0 1 2 3 4 5 6 7 8 9
Consumer-2
How Key_shared subscription works
Sticky key dispatcher(auto split hash range)
0 65536
Consumer-1
0 1 2 3 4 5 6 7 8 9
Consumer-2 Consumer-3
How Key_shared subscription works
Sticky key dispatcher(auto split hash range)
0 65536
Consumer-1
0 1 2 3 4 5 6 7 8 9
Consumer-4
How Key_shared subscription works
Sticky key dispatcher(auto split hash range)
0 65536
Consumer-1
0 1 2 3 4 5 6 7 8 9
Consumer-3Consumer-4
Key-based message batcher
p-0 p-1 p-2
<k3,v0>
<k2,v0>
<k2,v1>
<k3,v1>
<k1,v1>
<k4,v0>
<k5,v0>
<k6,v0>
<k6,v1>
Key-based message batcher
p-0 p-1 p-2
<k3,v0>
<k2,v0>
<k2,v1>
<k3,v1>
<k1,v1>
<k4,v0>
<k5,v0>
<k6,v0>
<k6,v1>
Use Key-based message batcher
Consumer consumer = client.newConsumer()
.topic(“my-topic”)
.subscriptionName(“my-sub”)
.subscriptionType(SubscriptionType.Key_Shared)
.batcherBuilder(BatcherBuilder.KEY_BASED)
.subscribe()
Pulsar SQL improvements
✓ Namespace delimiter rewriter

✓ Partition as internal column

✓ Primitive schema handle

➡ Multiple version schemas handle
Some other improvements
✓ Service URL provider
✓ Consumer reconnect limiter
➡ Messages batch receive
Next
★ Topic level policy
★ Sticky consumer
2.4.0 Release
1. New branch and tag

2. Stage release (check -> sign -> stage)

3. Move master to new version and write release notes

4. Start vote

5. Promote release and publish

6. Update site and announce the release
PHOTO
Schema versioning & HDFS offloader
Bo Cong 丛搏
Message platform engineer in zhaopin.com
Apache Pulsar contributor
The meaning of multi-version schema
message 1 message 2 message 3 message 4 message 5
Message's schema is not immutable
version 0 version 1 version 2 version 3 version 4schema
Problems caused by version changes
Class Person {
Int id;
}
Class Person {
Int id;
@AvroDefault(""Zhang San"")
String name;
}
Class Person {
Int id;
String name;
}
Version 0
Version 2
Version 1
Can read
Can readCan’t read
Change in compatibility policy
Back Ward
Back Ward Transitive
version 2 version 1 version 0
can read can read
version 2 version 1 version 0
can read can read
can read
Schema creation process
admin client api
admin rest api
producer create
consumer subscribe
schema data
SchemaRegistryService
new schema
old schema
version
compatibility check
Incompatible
version
Multi-version use in pulsar Avro schema
message 1
version 0
message 2
version 1
message 3
version 2
version 3
consumer
SchemaInfoProvider
Message
exist
new AvroReader()
Multi-version use in pulsar Avro schema
new ReflectDatumReader<>(writerSchema, readerSchema)
ReaderCache
Version0
read
not exist
find schema by version 0
from broker
read
If the read and write schema is
different in the Avro schema, the
reader needs to generate the
corresponding read and write
schema.
Multi-version use in pulsar Auto consume schema
only support AvroSchema and JsonSchema
GenericAvroRecord GenericJsonRecord
getField
unlike JsonSchema or AvroSchema, the reader only needs writerSchema.
Consumer<GenericRecord> consumer = client
.newConsumer(Schema.AUTO_CONSUME())
.topic("test")
.subscriptionName("test")
.subscribe();
The use of schema definition
Class Person {
@Nullable
String name;
}
SchemaDefinition<Person> schemaDefinition =
SchemaDefinition.<Person>builder()
.withAlwaysAllowNull(false)
.withPojo(Person.class).build();
Producer<Person> producer = null;
producer = client
.newProducer(Schema.AVRO(schemaDefinition))
.topic("persistent://public/default/test")
.create();
Why do we need HDFS offloader
Bookeeper HDFS
ManagedLedger
(Broker)
•Cold and Heat Data Separation
SSD
HDD
High throughput
Low latency Massive data storage
Offload topic ledgers to HDFS
stored relative path
tenant/namespace/topic/ledgerId + "-" + uuid.toString()
topic
ledger 1
ledger 2
ledger 3
index
data
index
data
index
data
HDFS Offloader storage structure
•Storage mode use org.apache.hadoop.io.MapFile
Index
Data entryID entryData entryID entryData entryID entryData
entryID entryID entryID entryID entryID entryID entryID
Configuring HDFS offloader
broker.conf
managedLedgerOffloadDriver=filesystem
offloadersDirectory=./offloaders
fileSystemURI=hdfs://127.0.0.1:9000
fileSystemProfilePath=../conf/filesystem_offload_core_site.xml
Thanks!

More Related Content

How Zhaopin contributes to Pulsar community

  • 2. PHOTO Zhaopin in Pulsar community Penghui Li 李李鹏辉 Messaging platform leader in zhaopin.com Apache Pulsar Committer
  • 4. Apache Pulsar in zhaopin.com 2018/08 First service for online applications 2018/10 1 billion / day 2019/02 6 billion / day 2019/08 20 billion / day 50+ Namespaces 5000+ Topics
  • 5. 1. Features of zhaopin contributing to the community 2. Details of Key_shared subscription 3. Release Pulsar 4. Details of Pulsar multiple schema version 5. Details of HDFS Offloader
  • 6. Dead letter topic Topic Topic-DLQConsumer … 2 1 0 1 message 1 process failed so many times
  • 8. Time partitioned un-ack message tracker p-4 p-3 p-2 p-1 p-0 Current partition Timeout partition Add messages to tracker Send redelivery request
  • 9. Message redelivery optimization 7 3 2 06 5 4 1 08 Consumer internal queue ConsumerBroker 0 1 2 3 4 5 6 7 0
  • 10. Key_shared subscription A new subscription mode in 2.4.0 Producer 1 Producer 2 Pulsar topic <k1,v0> <k2,v1> <k3,v2> <k2,v3> <k1,v4> Subscription Consumer D-1 Consumer D-2 Consumer D-3 <k1,v0> <k1,v4> <k3,v2> <k2,v1> <k2,v3>
  • 11. Start with Key_shared subscription Consumer consumer = client.newConsumer() .topic(“my-topic”) .subscriptionName(“my-sub”) .subscriptionType(SubscriptionType.Key_Shared) .subscribe()
  • 12. How Key_shared subscription works Sticky key dispatcher(auto split hash range) 0 65536 Consumer-1 0 1 2 3 4 5 6 7 8 9
  • 13. How Key_shared subscription works Sticky key dispatcher(auto split hash range) 0 65536 Consumer-1 0 1 2 3 4 5 6 7 8 9 Consumer-2
  • 14. How Key_shared subscription works Sticky key dispatcher(auto split hash range) 0 65536 Consumer-1 0 1 2 3 4 5 6 7 8 9 Consumer-2 Consumer-3
  • 15. How Key_shared subscription works Sticky key dispatcher(auto split hash range) 0 65536 Consumer-1 0 1 2 3 4 5 6 7 8 9 Consumer-4
  • 16. How Key_shared subscription works Sticky key dispatcher(auto split hash range) 0 65536 Consumer-1 0 1 2 3 4 5 6 7 8 9 Consumer-3Consumer-4
  • 17. Key-based message batcher p-0 p-1 p-2 <k3,v0> <k2,v0> <k2,v1> <k3,v1> <k1,v1> <k4,v0> <k5,v0> <k6,v0> <k6,v1>
  • 18. Key-based message batcher p-0 p-1 p-2 <k3,v0> <k2,v0> <k2,v1> <k3,v1> <k1,v1> <k4,v0> <k5,v0> <k6,v0> <k6,v1>
  • 19. Use Key-based message batcher Consumer consumer = client.newConsumer() .topic(“my-topic”) .subscriptionName(“my-sub”) .subscriptionType(SubscriptionType.Key_Shared) .batcherBuilder(BatcherBuilder.KEY_BASED) .subscribe()
  • 20. Pulsar SQL improvements ✓ Namespace delimiter rewriter ✓ Partition as internal column ✓ Primitive schema handle ➡ Multiple version schemas handle
  • 21. Some other improvements ✓ Service URL provider ✓ Consumer reconnect limiter ➡ Messages batch receive
  • 22. Next ★ Topic level policy ★ Sticky consumer
  • 23. 2.4.0 Release 1. New branch and tag 2. Stage release (check -> sign -> stage) 3. Move master to new version and write release notes 4. Start vote 5. Promote release and publish 6. Update site and announce the release
  • 24. PHOTO Schema versioning & HDFS offloader Bo Cong 丛搏 Message platform engineer in zhaopin.com Apache Pulsar contributor
  • 25. The meaning of multi-version schema message 1 message 2 message 3 message 4 message 5 Message's schema is not immutable version 0 version 1 version 2 version 3 version 4schema
  • 26. Problems caused by version changes Class Person { Int id; } Class Person { Int id; @AvroDefault(""Zhang San"") String name; } Class Person { Int id; String name; } Version 0 Version 2 Version 1 Can read Can readCan’t read
  • 27. Change in compatibility policy Back Ward Back Ward Transitive version 2 version 1 version 0 can read can read version 2 version 1 version 0 can read can read can read
  • 28. Schema creation process admin client api admin rest api producer create consumer subscribe schema data SchemaRegistryService new schema old schema version compatibility check Incompatible version
  • 29. Multi-version use in pulsar Avro schema message 1 version 0 message 2 version 1 message 3 version 2 version 3 consumer
  • 30. SchemaInfoProvider Message exist new AvroReader() Multi-version use in pulsar Avro schema new ReflectDatumReader<>(writerSchema, readerSchema) ReaderCache Version0 read not exist find schema by version 0 from broker read If the read and write schema is different in the Avro schema, the reader needs to generate the corresponding read and write schema.
  • 31. Multi-version use in pulsar Auto consume schema only support AvroSchema and JsonSchema GenericAvroRecord GenericJsonRecord getField unlike JsonSchema or AvroSchema, the reader only needs writerSchema. Consumer<GenericRecord> consumer = client .newConsumer(Schema.AUTO_CONSUME()) .topic("test") .subscriptionName("test") .subscribe();
  • 32. The use of schema definition Class Person { @Nullable String name; } SchemaDefinition<Person> schemaDefinition = SchemaDefinition.<Person>builder() .withAlwaysAllowNull(false) .withPojo(Person.class).build(); Producer<Person> producer = null; producer = client .newProducer(Schema.AVRO(schemaDefinition)) .topic("persistent://public/default/test") .create();
  • 33. Why do we need HDFS offloader Bookeeper HDFS ManagedLedger (Broker) •Cold and Heat Data Separation SSD HDD High throughput Low latency Massive data storage
  • 34. Offload topic ledgers to HDFS stored relative path tenant/namespace/topic/ledgerId + "-" + uuid.toString() topic ledger 1 ledger 2 ledger 3 index data index data index data
  • 35. HDFS Offloader storage structure •Storage mode use org.apache.hadoop.io.MapFile Index Data entryID entryData entryID entryData entryID entryData entryID entryID entryID entryID entryID entryID entryID