How Zhaopin contributes to Pulsar community
- 2. PHOTO
Zhaopin in Pulsar community
Penghui Li 李李鹏辉
Messaging platform leader in zhaopin.com
Apache Pulsar Committer
- 4. Apache Pulsar in zhaopin.com
2018/08
First service for online
applications
2018/10
1 billion / day
2019/02
6 billion / day
2019/08
20 billion / day
50+ Namespaces
5000+ Topics
- 5. 1. Features of zhaopin contributing to the community
2. Details of Key_shared subscription
3. Release Pulsar
4. Details of Pulsar multiple schema version
5. Details of HDFS Offloader
- 8. Time partitioned un-ack message tracker
p-4 p-3 p-2 p-1 p-0
Current partition Timeout partition
Add messages to tracker Send redelivery request
- 10. Key_shared subscription
A new subscription mode in 2.4.0
Producer 1
Producer 2
Pulsar
topic
<k1,v0>
<k2,v1>
<k3,v2>
<k2,v3>
<k1,v4>
Subscription
Consumer D-1
Consumer D-2
Consumer D-3
<k1,v0>
<k1,v4>
<k3,v2>
<k2,v1>
<k2,v3>
- 11. Start with Key_shared subscription
Consumer consumer = client.newConsumer()
.topic(“my-topic”)
.subscriptionName(“my-sub”)
.subscriptionType(SubscriptionType.Key_Shared)
.subscribe()
- 14. How Key_shared subscription works
Sticky key dispatcher(auto split hash range)
0 65536
Consumer-1
0 1 2 3 4 5 6 7 8 9
Consumer-2 Consumer-3
- 16. How Key_shared subscription works
Sticky key dispatcher(auto split hash range)
0 65536
Consumer-1
0 1 2 3 4 5 6 7 8 9
Consumer-3Consumer-4
- 19. Use Key-based message batcher
Consumer consumer = client.newConsumer()
.topic(“my-topic”)
.subscriptionName(“my-sub”)
.subscriptionType(SubscriptionType.Key_Shared)
.batcherBuilder(BatcherBuilder.KEY_BASED)
.subscribe()
- 20. Pulsar SQL improvements
✓ Namespace delimiter rewriter
✓ Partition as internal column
✓ Primitive schema handle
➡ Multiple version schemas handle
- 23. 2.4.0 Release
1. New branch and tag
2. Stage release (check -> sign -> stage)
3. Move master to new version and write release notes
4. Start vote
5. Promote release and publish
6. Update site and announce the release
- 25. The meaning of multi-version schema
message 1 message 2 message 3 message 4 message 5
Message's schema is not immutable
version 0 version 1 version 2 version 3 version 4schema
- 26. Problems caused by version changes
Class Person {
Int id;
}
Class Person {
Int id;
@AvroDefault(""Zhang San"")
String name;
}
Class Person {
Int id;
String name;
}
Version 0
Version 2
Version 1
Can read
Can readCan’t read
- 27. Change in compatibility policy
Back Ward
Back Ward Transitive
version 2 version 1 version 0
can read can read
version 2 version 1 version 0
can read can read
can read
- 28. Schema creation process
admin client api
admin rest api
producer create
consumer subscribe
schema data
SchemaRegistryService
new schema
old schema
version
compatibility check
Incompatible
version
- 29. Multi-version use in pulsar Avro schema
message 1
version 0
message 2
version 1
message 3
version 2
version 3
consumer
- 30. SchemaInfoProvider
Message
exist
new AvroReader()
Multi-version use in pulsar Avro schema
new ReflectDatumReader<>(writerSchema, readerSchema)
ReaderCache
Version0
read
not exist
find schema by version 0
from broker
read
If the read and write schema is
different in the Avro schema, the
reader needs to generate the
corresponding read and write
schema.
- 31. Multi-version use in pulsar Auto consume schema
only support AvroSchema and JsonSchema
GenericAvroRecord GenericJsonRecord
getField
unlike JsonSchema or AvroSchema, the reader only needs writerSchema.
Consumer<GenericRecord> consumer = client
.newConsumer(Schema.AUTO_CONSUME())
.topic("test")
.subscriptionName("test")
.subscribe();
- 32. The use of schema definition
Class Person {
@Nullable
String name;
}
SchemaDefinition<Person> schemaDefinition =
SchemaDefinition.<Person>builder()
.withAlwaysAllowNull(false)
.withPojo(Person.class).build();
Producer<Person> producer = null;
producer = client
.newProducer(Schema.AVRO(schemaDefinition))
.topic("persistent://public/default/test")
.create();
- 33. Why do we need HDFS offloader
Bookeeper HDFS
ManagedLedger
(Broker)
•Cold and Heat Data Separation
SSD
HDD
High throughput
Low latency Massive data storage
- 34. Offload topic ledgers to HDFS
stored relative path
tenant/namespace/topic/ledgerId + "-" + uuid.toString()
topic
ledger 1
ledger 2
ledger 3
index
data
index
data
index
data
- 35. HDFS Offloader storage structure
•Storage mode use org.apache.hadoop.io.MapFile
Index
Data entryID entryData entryID entryData entryID entryData
entryID entryID entryID entryID entryID entryID entryID