SlideShare a Scribd company logo
1
Aljoscha Krettek
@aljoscha
Big Data Spain
November 17, 2016
Apache Flink for IoT:
How Event-Time Processing
Enables Easy and Accurate
Analytics
What I’d Like to Talk About
2
 Streaming architecture and Flink
 IoT and event-time stream processing
 Use-case examples
3
Original creators of Apache
Flink®
Providers of the
dA Platform, a supported
Flink distribution
Intro: The Streaming Architecture
4
Big Data Architecture
 Collect events in HDFS (or similar)
 Periodically run (batch) jobs to process
 Problems:
• Huge latency
• Natural boundaries in data don’t match batch
boundaries
5
Rethinking Data Architecture
 Real-time reaction to events
 Continuous applications
 Process both real-time and historical data
6
What is (Distributed) Streaming
 Streaming:
Computations on never-
ending “streams” of data
records (“events”)
 Distributed:
Computation spread
across many machines
7
Your
code
Your
code
Your
code
Your
code
What is Stateful Streaming
 Result depends on history
of stream
 A stateful stream
processor should gives
the tools to manage state
• Recover, roll back, version,
upgrade, etc
8
Your
code
state
What is Event-Time Streaming
 Events have timestamps
 Processing depends on
timestamps
 An event-time stream
processor should give you the
tools to reason about time
• Handle streams that are out of
order
9
Your
code
state
t3 t1 t2t4 t1-t2 t3-t4
10
app state
app state
app state
event log
Query
service
Recap: What is Streaming?
 Continuous processing of data that is
continuously generated
 I.e., pretty much all “big” data
 It’s all about state and time
 Flink does all of that
11
IoT and Event-time Stream
Processing
12
13
1read.bi/1yDOQQ3
The 'Internet Of Everything' Will
Generate $14.4 Trillion Of Value Over
The Next Decade.1
Example Event Sources
14
A Simple Definition
15
IoT use cases from the system’s
perspective:
A large number of (distributed) things
continuously generating a large amount
of data.
IoT: Some Insights
16
 Data is continuously produced
→ Stream Processing
 Events have a timestamp
→ Event-time based processing
 Data/Events can arrive with huge
delays/out-of-order
 Most analyses happen on time windows
What Is Event-Time Processing
17
1977 1980 1983 1999 2002 2005 2015
Processing Time
Episode
IV
Episode
V
Episode
VI
Episode
I
Episode
II
Episode
III
Episode
VII
Event Time
What Is Event-Time Processing
18
1312735961112
1234567891011121314
Processing Time
Event timestamp
Message Queue
What’s The Problem?
19
13
12
735961112
1234567891011121314
Processing Time
Processing-Time Windows 137356
12 137 356Event-Time Windows
12
1112
Mismatch between event time
and processing time.
Sources of Time Mismatch
 Big Mismatch
• Network disconnects
• Slow network
 Small Mismatch
• The nature of distributed systems
• Differing system clock time
20
Small Event-Time Mismatch
21
Robust Stream Processing with Apache Flink®:
A Simple Walkthrough
http://data-artisans.com/robust-stream-processing-flink-walkthrough/
22
23
24
Recap: Event-Time
 IoT use cases need event-time processing
 Even small mismatch of event
time/processing time will lead to wrong
results
25
Use-Case Examples
26
30 Flink applications in production for more than one
year. 10 billion events (2TB) processed daily
Complex jobs of > 30 operators running 24/7,
processing 30 billion events daily, maintaining state
of 100s of GB with exactly-once guarantees
27
King
 Challenges:
• Many games (Candy Crush, Farm Heroes,
Pet Rescue, and Bubble Witch…)
• 300 million monthly unique users
• 30 billion events received every day
 Need event-time based statistics
28https://techblog.king.com/rbea-scalable-real-time-analytics-king/
Solution: RBEA
29https://techblog.king.com/rbea-scalable-real-time-analytics-king/
Solution: RBEA
 Multiplexing of multiple data scientist
requests into a single Flink job
 Groovy as language for analysis scripts
 Event-time windowing
30https://techblog.king.com/rbea-scalable-real-time-analytics-king/
Bouygues Telecom
31http://flink-forward.org/kb_sessions/a-brief-history-of-time-with-apache-flink-real-time-monitoring-and-analysis-with-flink-kafka-hb/
~120
users*
5 Flink
Production
Apps
750 TB
Storage
4 billion
Events/
day
2015
~300
users*
30 Flink
Production
Apps
2 PB
Storage
10 billion
Events/
day
2016
* Users of the information system
Bouygues: Challenges
 Low latency & streaming fashion counters
 Massive amounts of data + bursty loads
 Reliability
 Multiple flow correlation
 Time management:
• Out of order & late events → our worst enemies
32http://flink-forward.org/kb_sessions/a-brief-history-of-time-with-apache-flink-real-time-monitoring-and-analysis-with-flink-kafka-hb/
33http://flink-forward.org/kb_sessions/a-brief-history-of-time-with-apache-flink-real-time-monitoring-and-analysis-with-flink-kafka-hb/
In Summary
34
 If you need to ask: you already have a
streaming use case!
 IoT requires Proper Time Management
 Apache Flink has done that for a long time
now*
* Since version 0.10
3
Thank you!
@aljoscha
@ApacheFlink
@dataArtisans
36
One day of hands-on Flink training
One day of conference
Tickets are on sale
Call for Papers is already open
Please visit our website:
http://sf.flink-forward.org
Follow us on Twitter:
@FlinkForward
We are hiring!
data-artisans.com/careers

More Related Content

Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics

Editor's Notes

  1. E.g., counters, windows of past events, state machines, trained ML models