Easily Build a Smart Pulsar Stream Processor_Simon Crosby
- 2. SwimOS is an Apache 2.0 licensed platform that makes
it easy to build applications that deliver continuous
intelligence from streaming data, at scale
swimos.org
- 4. • Apache Pulsar
• Apache Kafka
• Apache Beam
• CNCF NATS
• Amazon Kinesis
• Google pub/sub
• Azure Enterprise Data Bus
• Salesforce Kafka
• Confluent Cloud
• …
Streaming Platforms
Ø SwimOS is a stream processor that delivers continuous intelligence
from streaming data
• Support pub/sub at scale
• Buffer data between pubs & subs
• Event-time ordered delivery
• Events stored in arrival order
• Don’t run applications
- 5. • Stream processors subscribe to a broker to analyze
streaming event data
• Their insights can be asynchronously consumed by
publishing back to the broker
• The broker offers a low-latency API that gives the stream
processor events in real-time
• Pulsar does not control execution of the stream processor
Stream Processors
- 6. SwimOS is a Stateful, Real-time Stream Processor
• Builds and auto-scales apps from real-world event data, creating a
stateful graph that continuously computes – driven by data
• Automates infrastructure operation
• Load balances, secures, persists and auto-scales the application
• Apps are easy to develop
• Delivers unimaginable performance
Application: Distributed, stateful, concurrent graph of
Web Agents & real-time UIs
Infra: Distributed, p2p mesh of instances on k8s using
WebSockets
- 7. 66
Major Mobile Provider
• > 150M devices
• > 10Gb/s of streaming data from Pulsar
• Continuous analysis, aggregation & reduction
• Millisecond latency
• Pervasively real-time UI
• Distributed across AZ
- 8. Pulsar’s Many Pros
• Event Processing
– Filtering
– Transformation
– Counts / Windows
– Alerts
• Serverless is a great abstraction
• SQL-style API
• Storage tiering
• Delivery guarantees
• Multi-tenancy
• Replication
• Scaling
Database
llll
- 9. • How many topics do you need?
Challenges…
!
l
l
l
l
üüüü
- 11. Application
Client
Client
Client
Client
Client
• Databases don’t drive computation!
(though in-memory is faster)
• What DB architecture do you need?
• Scaling / clustering / consistency …
Streaming analytics (#solved !)
☞ polling ® not real-time
engine_temp: 290
fan_temp: 188
coolant_vol: 25
Continuous Intelligence demands
• Data driven computation
• Analysis in context, everywhere, concurrently
• Stateful, in-memory, distributed
• Pervasively real-time computation
Challenges…
- 15. Users Want Stateful, Continuous, Contextual Analysis
Streams are a sequence of state changes
They never stop… (so “store-then-analyze” is silly)
“Meaning” depends on granular contextual
relationships
Applications always have to have an answer
λ λ
xn-1
- 16. Introducing Swim Web Agents
• SwimOS subscribes to event streams from real-world sources
• It creates a stateful, concurrent web agent for each data source
• Each web agent cleans, labels, analyzes data from its real-world twin
• Agents dynamically link to related agents, creating a stateful in-memory graph
• Containment, proximity… logical relationships eg: pod/cluster …
• Computed relationships: correlated…
• Linked web agents share their states in real-time
• Web Agents are vertices in the graph
• Each continually computes on its own state & state of its links, as data flows
over the graph – and streams its results in real-time over its links
• This is data-driven, stateful, continuous computation
- 17. Web Agents Continuously Compute - Driven by Data
MapReduce
Graph
Analytics
Learning & Prediction
Analyze data to determine state
Relational
Relational Analysis
Real-world Stateful Web Agent
- 20. • Noisy / redundant updates are discarded
I’m still red
- 21. I’m still red
I’m green
No push
No push
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
… …
…
…
…
Streaming data auto-scales the application –
composed of concurrent web agents - at low cost, in
real time, as data arrives
- 24. Web agents continuously compute on their own
state and the state of linked web agents
enabling granular contextual analysis on-the-fly
① SwimOS creates a web agent
for each source in streaming data
② Agents interlink to reflect
real-world relationships
③ Powerful operators for analysis, learning &
prediction continuously compute on state
& stream results
Web Agents Link to Form a Computational Graph
- 26. • A scaled application is a graph
dynamically built from data
• Objects are stateful and
concurrent
- 27. SwimOS Eliminates “the Stack”
=
They continuously stream real-
time insights to UIs & applications
Web agents collaborate to
analyze, learn, predict and
respond on the fly
Swim builds a stateful, distributed,
graph of concurrent web agents
that statefully represent real-world
sources, from streaming data
*
Developer defines entities & their
relationships – as Java objects
- 28. Pulsar and Swim: Better Together
• Builds and auto-scales apps from real-world event data, creating a
stateful graph that continuously computes – driven by data
• Automates infrastructure operation
• Load balances, secures, persists and auto-scales the application
• Apps are easy to develop
• Delivers unimaginable performance
Application: Distributed, stateful, concurrent graph of
Web Agents & real-time UIs
Infra: Distributed, p2p mesh of instances on k8s using
WebSockets